Running the MCMICRO Pipeline
Usage
Once installed, the basic pipeline execution consists of:
- Ensuring you have the latest version of the pipeline
- Using
--in
to point the pipeline at the data
# Get the latest version of the pipeline
nextflow pull labsyspharm/mcmicro
# Run the pipeline on data (starting from the registration step through quantification, by default)
nextflow run labsyspharm/mcmicro --in path/to/my/data
(Where
path/to/my/data
is replaced with your specific path.)
Input
At the minimum, the pipeline expects two inputs with an optional third one:
markers.csv
in the parent directory (containing metadata with markers)- Raw images in the
raw/
subdirectory - (Optional) Illumination profiles in the
illumination/
subdirectory.
Example input directory:
exemplar-001
├── markers.csv
├── raw/
└── illumination/
Markers
The file markers.csv
must be in a comma-delimited format and contain a column titled marker_name
that defines marker names of every channel:
Example markers file:
cycle,marker_name
1,DNA_1
1,AF488
1,AF555
1,AF647
2,DNA_2
2,A488_background
2,A555_background
2,A647_background
3,DNA_3
3,FDX1
3,CD357
3,CD1D
All other columns are optional but can be used to specify additional metadata (e.g., known mapping to cell types) to be used by individual modules.
Raw images
The exemplar raw/
files are in the open standard OME-TIFF format, but in practice your input files will be in whatever format your microscope produces. The pipeline supports all Bio-Formats-compatible image formats, but additional parameters may be required.
(Optional) Illumination corrected images
Pre-computed flat-field and dark-field illumination profiles can be placed in the illumination/
directory. If no pre-computed profiles are available, MCMICRO can compute these using BaSiC. This step is not executed by default, because proper illumination correction requires careful curation and visual inspection of the profiles produced by computational tools. After familiarizing yourself with the [general concepts] (https://en.wikipedia.org/wiki/Flat-field_correction){:target=”_blank”}, the profiles can be computed by specifying --start-at illumination
.
Output
Stitching and registration
ASHLAR is the default first step of the pipeline. ASHLAR will aggregate individual image tiles from raw/
along with the corresponding illumination profiles to produce a stitched and registered mosaic image.
This mosaic image will be published to the registration/
subdirectory:
exemplar-001
├── markers.csv
├── raw/
├── illumination/
└── registration/
└── exemplar-001.ome.tif
The output filename will be generated based on the name of the project directory.
(Optional) TMA dearray
When working with Tissue Microarrays (TMA), Coreograph is used for TMA dearraying. The registration/
folder will contain an image of the entire TMA. Use the --tma
flag during pipeline execution to have MCMICRO identify and isolate individual cores.
Each core will be written out into a standalone file in the dearray/
subdirectory along with the mask specifying where in the original image the core appeared:
exemplar-002
├── ...
├── registration/
│ └── exemplar-002.ome.tiff
└── dearray/
├── 1.tif
├── 2.tif
├── 3.tif
├── 4.tif
└── masks/
├── 1_mask.tif
├── 2_mask.tif
├── 3_mask.tif
└── 4_mask.tif
All cores will then be processed in parallel by all subsequent steps.
Segmentation
Cell segmentation is carried out in two steps. First, the pipeline generates probability maps that annotate each pixel with the probability that it belongs to a given subcellular component (nucleus, cytoplasm, cell boundary) using UnMICST (default) or Ilastik. The second step applies standard watershed segmentation to produce the final cell/nucleus/cytoplasm/etc. masks using S3segmenter.
The two steps will appear in probability-maps/
and segmentation
directories, respectively. When there are multiple modules for a given pipeline step, their results will be subdivided into additional subdirectories:
exemplar-001
├── ...
├── probability-maps/
│ ├── ilastik/
│ │ └── exemplar-001_Probabilities.tif
│ └── unmicst/
│ └── exemplar-001_Probabilities_0.tif
└── segmentation/
├── ilastik-exemplar-001/
│ ├── cell.ome.tif
│ └── nuclei.ome.tif
└── unmicst-exemplar-001/
├── cell.ome.tif
└── nuclei.ome.tif
Quantification
The final step, MCQuant, combines information in segmentation masks, the original stitched image and markers.csv
to produce Spatial Feature Tables that summarize the expression of every marker on a per-cell basis, alongside additional morphological features (cell shape, size, etc.).
Spatial Feature Tables will be published to the quantification/
directory:
exemplar-001
├── ...
├── segmentation/
└── quantification/
├── ilastik-exemplar-001_cell.csv
└── unmicst-exemplar-001_cell.csv
There is a direct correspondence between the .csv
filenames and the filenames of segmentation masks. For example, quantification/unmicst-exemplar-001_cell.csv
quantifies segmentation/unmicst-exemplar-001/cell.ome.tif
.
Quality control
Additional information during pipeline execution will be written to the qc/
directory, by both individual modules and the pipeline itself.
exemplar-002
├── ...
└── qc
├── params.yml
├── provenance/
│ ├── probmaps:ilastik (1).log
│ ├── probmaps:ilastik (1).sh
│ ├── probmaps:unmicst (1).log
│ ├── probmaps:unmicst (1).sh
│ ├── quantification (1).log
│ ├── quantification (1).sh
│ └── ...
├── coreo/
├── s3seg/
└── unmicst/
While the exact content of the qc/
directory will depend on which modules were executed, two sources of information can always be found there:
- The file
params.yml
will contain the full record of module versions and all parameters used to run the pipeline. This allows for full reproducibility of future runs. - The
provenance/
subdirectory will contain exact commands (.sh
) executed by individual modules, as well the output (.log
) of these commands.
* You should retain params.yml
and provenance/
because these files enable full reproducibility of a pipeline run. The other QC files can be safely deleted once the quality of the outputs has been verified and no more parameter tuning is expected.
The remaining directories will contain QC files specific to individual modules:
- When working with TMAs,
coreo/
will containTMA_MAP.tif
, a mask showing where in the original TMA image the segmented cores reside. - If UnMicst was used to generate probability maps,
unmicst/
will contain thumbnail previews, allowing for a quick assessment of their quality. - After segmentation, two-channel tif files containing DAPI and nuclei/cell/cytoplasm outlines will reside in
s3seg/
, allowing for a visual inspection of segmentation quality.
Parameters
The following parameters control the pipeline as a whole. These can be specified on the command line using the double-dash format (e.g., --in
), or inside a YAML file as key-value pairs.
Required arguments:
Parameter | Description |
---|---|
--in /local/path | Location of the data |
Optional arguments:
Parameter | Default | Description |
---|---|---|
--sample-name <myname> | Directory name supplied to --in | The name of the experiment/specimen |
--start-at <step> | registration | Name of the first step to be executed by the pipeline. Must be one of illumination , registration , dearray (TMA only), probability-maps , segmentation , quantification , cell-states |
--stop-at <step> | quantification | Name of the final step to be executed by the pipeline. Spans the same vocabulary as --start-at . |
--tma | Omitted | If specified, MCMICRO treats input data as a TMA. If omitted, the input is assumed to be a whole-slide image. |
--ilastik-model <model.ilp> | None | A custom .ilp file to be used as the classifier model for ilastik. |
--probability-maps <choice> | unmicst | Which module(s) to use for probability map computation. Module names should be delimited with a comma without spaces, e.g., --probability-maps unmicst,ilastik |
--qc-files <op> | copy | Must be one of copy , move or symlink , controlling whether QC files should be copied, moved or symbolically linked from work directories to the project directory |
Specifying path for intermediate files
By default Nextflow writes intermediate files to a work/
directory inside whatever location you initiate a pipeline run from. Use -w
flag to provide a different location.
nextflow run labsyspharm/mcmicro --in /path/to/my-data -w /path/to/work/
Specifying start and stop modules
By default, the pipeline starts from the registration step (ASHLAR), proceeds through UnMICST, S3segmenter, and stops after executing the quantification MCQuant step.
Use --start-at
and --stop-at
flags to execute any contiguous section of the pipeline instead. Any subdirectory name listed in the directory structure is a valid starting and stopping point.
# If you already have a pre-stitched TMA image, start at the dearray step
nextflow run labsyspharm/mcmicro --in path/to/exemplar-002 --tma --start-at dearray
# If you want to run the illumination profile computation and registration only
nextflow run labsyspharm/mcmicro --in path/to/exemplar-001 --start-at illumination --stop-at registration
Note: Starting at any step beyond registration requires pre-computed output of the previous steps placed at the correct location in the project directory.
Specifying module-specific parameters
The pipeline provides a sensible set of default parameters for individual modules. To change these use
--ashlar-opts
, --unmicst-opts
, --s3seg-opts
and --quant-opts
.
For example: nextflow run labsyspharm/mcmicro --in /path/to/my-data --ashlar-opts '-m 35 --pyramid'
will provide -m 35 --pyramid
as additional command line arguments to ASHLAR.
Go to modules for a list of options available for each module.
Using YAML parameter files
As the number of custom flags grows, providing them all on the command line can become unwieldy. Instead, parameter values can be stored in a YAML file, which is then provided to Nextflow using -params-file
.
The general rules of thumb for composing YAML files:
- Anything that would appear as
--param value
on the command line should beparam: value
in the YAML file. - Anything that would appear as
--flag
on the command line should beflag: true
in the YAML file.
Note: The above only applies to double-dashed arguments (which are passed to the pipeline). The single-dash arguments (like -profile
) cannot be moved to YAML, because they are given to nextflow; the pipeline never sees them.
For example, consider the following command:
nextflow run labsyspharm/mcmicro --in /data/exemplar-002 --tma --start-at dearray --ashlar-opts '-m 35 --pyramid'
All double-dashed arguments can be moved to a YAML file (e.g., myexperiment.yml) using the rules above:
in: /data/exemplar-002
tma: true
start-at: dearray
ashlar-opts: -m 35 --pyramid
The YAML file can then be fed to the pipeline via
nextflow run labsyspharm/mcmicro -params-file myexperiment.yml
Find more information about the YAML syntax here.
Directory Structure
Upon the full successful completion of a pipeline run, the directory structure will follow Fig. 1A in the MCMICRO manuscript:
Note: This directory should correspond directly to the Nextflow workflow. For the Galaxy workflow, the intermediaries and output files should be identical, but the organization of the files within directories and the filenames will be different.
Schematic | Directory Structure |
---|---|
![]() | exemplar-002 |
The name of the parent directory (e.g., exemplar-002
) is assumed by the pipeline to be the sample name.
Visual inspection of quality control (qc/
) files is recommended after completing the run. Depending on the modules used, directories coreo/
, unmicst/
and s3seg/
may contain .tif
images for inspection.