Link Search Menu Expand Document

Module specifications

Most users will not need to modify the default modules specifications, unless you want to:

  1. Run a different version of an existing module, or
  2. Add another module

The following page describes how to use the modules: namespace of the parameter file to accomplish either of these goals.


Table of contents
  1. Module specifications
    1. Modifying a module version
    2. Adding a module
      1. Input and output specs
      2. Segmentation modules
      3. Downstream modules
  2. Configuration
    1. Name
    2. Container and version
    3. Command
    4. Input
    5. (Optional) Model
    6. (Optional) Channel and indexing base
    7. Watershed
    8. Putting it all together

Modifying a module version

If a module is specified as the only option for a given processing step (illumination, registration, dearray, watershed, quantification and viz), then the version should be specified directly under the corresponding step using the version: field. In this case, an example params.yml may look as follows:

modules:
  registration:
    version: 1.16.0
  watershed:
    version: 1.4.0-large

Alternatively, if a module is one of several options for a given processing step (currently only segmentation and downstream), then the version specification should also include a name: field to allow matching against existing modules. An example params.yml in this case may look as following:

modules:
  segmentation:
    -
      name: unmicst
      version: 2.7.3
    -
      name: mesmer
      version: 0.4.0
  downstream:
    -
      name: naivestates
      version: 1.6.2

When in doubt, follow the same structure as the one represented by the default values.

Adding a module

Adding a new module specification is currently only possible for segmentation and downstream processing steps.

Input and output specs

Every module must have a command-line interface (CLI) that has been encapsulated inside a Docker container. MCMICRO assumes that CLI conforms to the following input-output specifications.

Segmentation modules

Input:

  • A file in .ome.tif format containing a fully stitched and registered multiplexed image.
  • (Optional) A file containing a custom model for the algorithm. The file can be in any format, and it is up to the module developer to decide what formats they allow from users.

Output:

  • An image file in .tif format, written to . (i.e., the “current working directory”). The file can be either a probability map or a segmentation mask. The image channels in probability maps annotate each pixel with probabilities that it belongs to the background or different parts of the cell such as the nucleus, cytoplasm, cell membrane or the intercellular region. Similarly, segmentation masks annotate each pixel with an integer index of the cell it belongs to, or 0 if none.
  • (Optional) One or more files written to ./qc/ (i.e., qc/ subdirectory within the “current working directory”). These will be copied by the pipeline to the corresponding location in the project’s qc/ directory.

Downstream modules

Input:

  • A file in .csv format containing a spatial feature table. Each row in a table corresponds to a cell, while columns contain features characterizing marker expression or morphological properties.
  • (Optional) A file containing a custom model for the algorithm. The file can be in any format, and it is up to the module developer to decide what formats they allow from users.

Output:

  • One or more files in .csv or .hdf5 format, written to . (i.e., the “current working directory”). Each file should annotate individual cells with the corresponding inferred cell state.
  • (Optional) One or more files written to ./plots/ (i.e., plots/ subdirectory within the “current working directory”). Each file can be in any format and contain any information that the module developer thinks will be useful to the user (e.g., UMAP plots showing how cells cluster together).
  • (Optional) One or more files written to ./qc/ (i.e., qc/ subdirectory within the “current working directory”). These will be copied by the pipeline to the corresponding location in the project’s qc/ directory.

Configuration

Adding a new MCMICRO module involves specifying simple key-value pairs in the modules: section of params.yml. For example, consider the following configuration for ilastik:

modules:
  segmentation:
    -
      name: ilastik
      container: labsyspharm/mcmicro-ilastik
      version: 1.4.5
      cmd: python /app/mc-ilastik.py --output .
      input: --input
      model: --model
      channel: --channelIDs
      idxbase: 1
      watershed: 'yes'

Name

The name of the module determines two things. First, it specifies the names of subdirectories for where the output files will be written to in the project directory. In the given example, the primary outputs will appear in probability-maps/ilastik/, while QC files will be written to qc/ilastik/. Second, the module name also tells MCMICRO what other parameters to look for. In our example, the pipeline will look for module specific parameters in --ilastik-opts and a custom model file in --ilastik-model.

Container and version

The two fields must uniquely identify a Docker container image containing the tool. Mechanistically, the fields are combined using the standard REPOSITORY:TAG convention.

Command

The cmd field must contain a command that, when executed inside the container, will produce the required set of outputs from the inputs provided to it by the pipeline.

It is imperative that all primary outputs are written to . (i.e., the “current working directory”). MCMICRO will automatically sort outputs to their correct location in the project directory. Writing outputs to any other location may result in MCMICRO failing to locate them.

Input

The input field determines how the pipeline will supply inputs to the module. Some examples in the context of exemplar-001 may look as follows:

ConfigurationWhat MCMICRO will execute
cmd : 'python /app/tool.py -o .'
input : '-i'
python /app/tool.py -o . -i exemplar-001.ome.tif
cmd : 'python /app/tool.py -o .'
input : '--input'
python /app/tool.py -o . --input exemplar-001.ome.tif
cmd : 'python /app/tool.py -o .'
input : ''
python /app/tool.py -o . exemplar-001.ome.tif

(Optional) Model

The model field functions similarly to input and specifies how the pipeline will supply a custom model to the tool. In this example, MCMICRO will check whether the user specified --ilastik-model in the calling arguments, and pass the corresponding value to ilastik via --model.

(Optional) Channel and indexing base

The channel field indicates how MCMICRO should pass --segmentation-channel value(s) specified by the user to the module. The idxbase field specifies whether the module assumes 0-based or 1-based indexing. All channel indexing in MCMICRO starts with 1, and the pipeline will correctly account for 0-based indexing, if it used by the module.

Watershed

The watershed field specifies whether the module requires a subsequent watershed step. Set it to 'yes' for modules that produce probability maps and 'no' for instance segmenters. Alternatively, you can specify 'bypass' to have the output still go through S3Segmenter with the --nucleiRegion bypass flag. This will skip watershed but still allow you to filter nuclei by size with --logSigma.

Putting it all together

Given the above configuration for ilastik, users of MCMICRO can begin using the module by including the following inside their params.yml:

workflow:
  segmentation: ilastik
  segmentation-channel: 1 5
  ilastik-model: myawesomemodel.ilp
options:
  ilastik: --num_channels 2

As exemplar-001 makes its way through the pipeline, it will eventually encounter the segmentation step. The pipeline will then identify ilastik as the module to be executed from the --segmentation flag. The actual command that MCMICRO runs will then be composed using all the above fields together:

python /app/mc-ilastik.py --output . --input exemplar-001.ome.tif --model myawesomemodel.ilp --channelIDs 1 5 --num_channels 2