Link Search Menu Expand Document

Running the MCMICRO Pipeline

Table of contents
  1. Running the MCMICRO Pipeline
    1. Usage
    2. Input
      1. Markers
      2. Raw images
      3. (Optional) Illumination corrected images
    3. Output
      1. Stitching and registration
      2. (Optional) TMA dearray
      3. Segmentation
      4. Quantification
      5. Quality control
    4. Parameters
      1. Required arguments:
      2. Optional arguments:
      3. Specifying path for intermediate files
      4. Specifying start and stop modules
      5. Specifying module-specific parameters
      6. Using YAML parameter files
    5. Directory Structure

Usage

Once installed, the basic pipeline execution consists of:

  1. Ensuring you have the latest version of the pipeline
  2. Using --in to point the pipeline at the data
# Get the latest version of the pipeline
nextflow pull labsyspharm/mcmicro

# Run the pipeline on data (starting from the registration step through quantification, by default)
nextflow run labsyspharm/mcmicro --in path/to/my/data

(Where path/to/my/data is replaced with your specific path.)

Back to top


Input

At the minimum, the pipeline expects two inputs with an optional third one:

  1. markers.csv in the parent directory (containing metadata with markers)
  2. Raw images in the raw/ subdirectory
  3. (Optional) Illumination profiles in the illumination/ subdirectory.

Example input directory:

exemplar-001
├── markers.csv
├── raw/
└── illumination/


Markers

The file markers.csv must be in a comma-delimited format and contain a column titled marker_name that defines marker names of every channel:

Example markers file:

cycle,marker_name
1,DNA_1
1,AF488
1,AF555
1,AF647
2,DNA_2
2,A488_background
2,A555_background
2,A647_background
3,DNA_3
3,FDX1
3,CD357
3,CD1D

All other columns are optional but can be used to specify additional metadata (e.g., known mapping to cell types) to be used by individual modules.


Raw images

The exemplar raw/ files are in the open standard OME-TIFF format, but in practice your input files will be in whatever format your microscope produces. The pipeline supports all Bio-Formats-compatible image formats, but additional parameters may be required.


(Optional) Illumination corrected images

Pre-computed flat-field and dark-field illumination profiles can be placed in the illumination/ directory. If no pre-computed profiles are available, MCMICRO can compute these using BaSiC. This step is not executed by default, because proper illumination correction requires careful curation and visual inspection of the profiles produced by computational tools. After familiarizing yourself with the [general concepts] (https://en.wikipedia.org/wiki/Flat-field_correction){:target=”_blank”}, the profiles can be computed by specifying --start-at illumination.

Back to top


Output

Stitching and registration

ASHLAR is the default first step of the pipeline. ASHLAR will aggregate individual image tiles from raw/ along with the corresponding illumination profiles to produce a stitched and registered mosaic image.

This mosaic image will be published to the registration/ subdirectory:

exemplar-001
├── markers.csv
├── raw/
├── illumination/
└── registration/
    └── exemplar-001.ome.tif

The output filename will be generated based on the name of the project directory.


(Optional) TMA dearray

When working with Tissue Microarrays (TMA), Coreograph is used for TMA dearraying. The registration/ folder will contain an image of the entire TMA. Use the --tma flag during pipeline execution to have MCMICRO identify and isolate individual cores.

Each core will be written out into a standalone file in the dearray/ subdirectory along with the mask specifying where in the original image the core appeared:

exemplar-002
├── ...
├── registration/
│   └── exemplar-002.ome.tiff
└── dearray/
    ├── 1.tif
    ├── 2.tif
    ├── 3.tif
    ├── 4.tif
    └── masks/
        ├── 1_mask.tif
        ├── 2_mask.tif
        ├── 3_mask.tif
        └── 4_mask.tif

All cores will then be processed in parallel by all subsequent steps.


Segmentation

Cell segmentation is carried out in two steps. First, the pipeline generates probability maps that annotate each pixel with the probability that it belongs to a given subcellular component (nucleus, cytoplasm, cell boundary) using UnMICST (default) or Ilastik. The second step applies standard watershed segmentation to produce the final cell/nucleus/cytoplasm/etc. masks using S3segmenter.

The two steps will appear in probability-maps/ and segmentation directories, respectively. When there are multiple modules for a given pipeline step, their results will be subdivided into additional subdirectories:

exemplar-001
├── ...
├── probability-maps/
│   ├── ilastik/
│   │   └── exemplar-001_Probabilities.tif
│   └── unmicst/
│       └── exemplar-001_Probabilities_0.tif
└── segmentation/
    ├── ilastik-exemplar-001/
    │   ├── cell.ome.tif
    │   └── nuclei.ome.tif
    └── unmicst-exemplar-001/
        ├── cell.ome.tif
        └── nuclei.ome.tif


Quantification

The final step, MCQuant, combines information in segmentation masks, the original stitched image and markers.csv to produce Spatial Feature Tables that summarize the expression of every marker on a per-cell basis, alongside additional morphological features (cell shape, size, etc.).

Spatial Feature Tables will be published to the quantification/ directory:

exemplar-001
├── ...
├── segmentation/
└── quantification/
    ├── ilastik-exemplar-001_cell.csv
    └── unmicst-exemplar-001_cell.csv

There is a direct correspondence between the .csv filenames and the filenames of segmentation masks. For example, quantification/unmicst-exemplar-001_cell.csv quantifies segmentation/unmicst-exemplar-001/cell.ome.tif.


Quality control

Additional information during pipeline execution will be written to the qc/ directory, by both individual modules and the pipeline itself.

exemplar-002
├── ...
└── qc
    ├── params.yml
    ├── provenance/
    │   ├── probmaps:ilastik (1).log
    │   ├── probmaps:ilastik (1).sh
    │   ├── probmaps:unmicst (1).log
    │   ├── probmaps:unmicst (1).sh
    │   ├── quantification (1).log
    │   ├── quantification (1).sh
    │   └── ...
    ├── coreo/
    ├── s3seg/
    └── unmicst/

While the exact content of the qc/ directory will depend on which modules were executed, two sources of information can always be found there:

  1. The file params.yml will contain the full record of module versions and all parameters used to run the pipeline. This allows for full reproducibility of future runs.
  2. The provenance/ subdirectory will contain exact commands (.sh) executed by individual modules, as well the output (.log) of these commands.

* You should retain params.yml and provenance/ because these files enable full reproducibility of a pipeline run. The other QC files can be safely deleted once the quality of the outputs has been verified and no more parameter tuning is expected.


The remaining directories will contain QC files specific to individual modules:

  1. When working with TMAs, coreo/ will contain TMA_MAP.tif, a mask showing where in the original TMA image the segmented cores reside.
  2. If UnMicst was used to generate probability maps, unmicst/ will contain thumbnail previews, allowing for a quick assessment of their quality.
  3. After segmentation, two-channel tif files containing DAPI and nuclei/cell/cytoplasm outlines will reside in s3seg/, allowing for a visual inspection of segmentation quality.

Back to top


Parameters

The following parameters control the pipeline as a whole. These can be specified on the command line using the double-dash format (e.g., --in), or inside a YAML file as key-value pairs.

Required arguments:

ParameterDescription
--in /local/pathLocation of the data

Optional arguments:

ParameterDefaultDescription
--sample-name <myname>Directory name supplied to --inThe name of the experiment/specimen
--start-at <step>registrationName of the first step to be executed by the pipeline. Must be one of illumination, registration, dearray (TMA only), probability-maps, segmentation, quantification, cell-states
--stop-at <step>quantificationName of the final step to be executed by the pipeline. Spans the same vocabulary as --start-at.
--tmaOmittedIf specified, MCMICRO treats input data as a TMA. If omitted, the input is assumed to be a whole-slide image.
--ilastik-model <model.ilp>NoneA custom .ilp file to be used as the classifier model for ilastik.
--probability-maps <choice>unmicstWhich module(s) to use for probability map computation. Module names should be delimited with a comma without spaces, e.g., --probability-maps unmicst,ilastik
--qc-files <op>copyMust be one of copy, move or symlink, controlling whether QC files should be copied, moved or symbolically linked from work directories to the project directory


Specifying path for intermediate files

By default Nextflow writes intermediate files to a work/ directory inside whatever location you initiate a pipeline run from. Use -w flag to provide a different location.

nextflow run labsyspharm/mcmicro --in /path/to/my-data -w /path/to/work/


Specifying start and stop modules

By default, the pipeline starts from the registration step (ASHLAR), proceeds through UnMICST, S3segmenter, and stops after executing the quantification MCQuant step.

Use --start-at and --stop-at flags to execute any contiguous section of the pipeline instead. Any subdirectory name listed in the directory structure is a valid starting and stopping point.

# If you already have a pre-stitched TMA image, start at the dearray step
nextflow run labsyspharm/mcmicro --in path/to/exemplar-002 --tma --start-at dearray

# If you want to run the illumination profile computation and registration only
nextflow run labsyspharm/mcmicro --in path/to/exemplar-001 --start-at illumination --stop-at registration

Note: Starting at any step beyond registration requires pre-computed output of the previous steps placed at the correct location in the project directory.


Specifying module-specific parameters

The pipeline provides a sensible set of default parameters for individual modules. To change these use
--ashlar-opts, --unmicst-opts, --s3seg-opts and --quant-opts.

For example: nextflow run labsyspharm/mcmicro --in /path/to/my-data --ashlar-opts '-m 35 --pyramid' will provide -m 35 --pyramid as additional command line arguments to ASHLAR.

Go to modules for a list of options available for each module.


Using YAML parameter files

As the number of custom flags grows, providing them all on the command line can become unwieldy. Instead, parameter values can be stored in a YAML file, which is then provided to Nextflow using -params-file.

The general rules of thumb for composing YAML files:

  1. Anything that would appear as --param value on the command line should be param: value in the YAML file.
  2. Anything that would appear as --flag on the command line should be flag: true in the YAML file.

Note: The above only applies to double-dashed arguments (which are passed to the pipeline). The single-dash arguments (like -profile) cannot be moved to YAML, because they are given to nextflow; the pipeline never sees them.

For example, consider the following command:

nextflow run labsyspharm/mcmicro --in /data/exemplar-002 --tma --start-at dearray --ashlar-opts '-m 35 --pyramid'

All double-dashed arguments can be moved to a YAML file (e.g., myexperiment.yml) using the rules above:

in: /data/exemplar-002
tma: true
start-at: dearray
ashlar-opts: -m 35 --pyramid

The YAML file can then be fed to the pipeline via

nextflow run labsyspharm/mcmicro -params-file myexperiment.yml

Find more information about the YAML syntax here.

Back to top


Directory Structure

Upon the full successful completion of a pipeline run, the directory structure will follow Fig. 1A in the MCMICRO manuscript:

Note: This directory should correspond directly to the Nextflow workflow. For the Galaxy workflow, the intermediaries and output files should be identical, but the organization of the files within directories and the filenames will be different.

SchematicDirectory Structure
MCMICROexemplar-002
├── markers.csv
├── raw/
├── illumination/
├── registration/
├── dearray/
├── probability-maps/
├── segmentation/
├── quantification/
└── qc/

The name of the parent directory (e.g., exemplar-002) is assumed by the pipeline to be the sample name.

Visual inspection of quality control (qc/) files is recommended after completing the run. Depending on the modules used, directories coreo/, unmicst/ and s3seg/ may contain .tif images for inspection.

Back to top


Table of contents