Feature Extraction¶
Feature extraction computes quantitative measurements from segmented cells, including morphological features (shape, size) and intensity features (marker expression levels).
Overview¶
After segmentation, feature extraction quantifies cell properties that will be used for downstream analysis such as clustering, phenotyping, and spatial analysis. Features are extracted per cell and stored in a CSV file with one row per cell.
Options¶
OpenIMC extracts two main categories of features:
Morphological Features: Shape and size characteristics of cells
Intensity Features: Expression levels of each marker channel within each cell
Parameters¶
Feature Selection¶
morphological (default:
true): Extract morphological features - Includes: area, perimeter, eccentricity, solidity, extent, and more - See “Morphological Features” section below for complete listintensity (default:
true): Extract intensity features - Includes: mean, median, max, min, std for each channel - See “Intensity Features” section below for complete list
Preprocessing Parameters¶
arcsinh (default:
false): Apply arcsinh transformation to intensity features - Helps normalize highly skewed intensity distributions - Recommended for IMC data with wide dynamic range - Applied to extracted features, not raw imagesarcsinh_cofactor (default:
10.0): Cofactor for arcsinh transformation - Lower values (5.0) compress high intensities more - Higher values (20.0) preserve more of the original distribution - Typical range: 5.0-20.0denoise_settings (optional): Dictionary with denoise settings per channel - Format:
{"Channel1": {"hot": {"method": "median3"}, "speckle": {"method": "gaussian", "sigma": 0.8}}}- Can also be a path to a JSON file - Applied to raw images before feature extractionexcluded_channels (optional): Set of channel names to exclude from feature extraction - Useful for excluding channels that are not informative - Example:
{"DAPI", "Background"}
Spillover Correction¶
spillover_correction (optional): Configuration for spillover correction - enabled (default:
false): Enable spillover correction - matrix_file (required if enabled): Path to spillover matrix CSV fileMatrix format: rows and columns are channel names, values are spillover coefficients
method (default:
"nnls"): Correction method -"nnls": Non-negative least squares (recommended) -"pgd": Projected gradient descent
Other Parameters¶
acquisition (optional): Specific acquisition ID or name to process - If not specified, processes all acquisitions
mask (required): Path to segmentation mask directory or single mask file - Masks are matched to acquisitions by filename - Supports
.tif,.tiff, or.npyformats
Morphological Features¶
The following morphological features are extracted for each cell:
area: Cell area in pixels
perimeter: Cell perimeter in pixels
eccentricity: Eccentricity of the ellipse with same second moments (0=circle, 1=line)
solidity: Ratio of cell area to convex hull area
extent: Ratio of cell area to bounding box area
major_axis_length: Length of major axis of fitted ellipse
minor_axis_length: Length of minor axis of fitted ellipse
orientation: Orientation of major axis in radians
centroid_x: X coordinate of cell centroid
centroid_y: Y coordinate of cell centroid
equivalent_diameter: Diameter of circle with same area
euler_number: Euler characteristic (topology measure)
Intensity Features¶
For each marker channel, the following intensity features are extracted:
{channel}_mean: Mean intensity within the cell
{channel}_median: Median intensity within the cell
{channel}_max: Maximum intensity within the cell
{channel}_min: Minimum intensity within the cell
{channel}_std: Standard deviation of intensities within the cell
{channel}_sum: Sum of all pixel intensities within the cell
Where {channel} is replaced by the actual channel name (e.g., CD3_1841_mean).
Using Feature Extraction in the GUI¶
Ensure segmentation has been completed and masks are available
Navigate to Analysis → Feature Extraction or click the feature extraction button
In the feature extraction dialog: - Select which acquisitions to process - Choose feature types (morphological, intensity, or both) - Configure preprocessing options:
Enable/disable arcsinh transformation
Set arcsinh cofactor
Configure denoising if needed
Optionally configure spillover correction
Select channels to exclude if any
Choose output location for the features CSV file
Click Extract Features to start the process
Progress is shown in a progress dialog
The resulting CSV file contains one row per cell with all extracted features
Using Feature Extraction in the CLI¶
Basic Command¶
openimc extract-features input.mcd features.csv \\
--mask segmentation_masks/ \\
--morphological \\
--intensity
With Arcsinh Transformation¶
openimc extract-features input.mcd features.csv \\
--mask segmentation_masks/ \\
--arcsinh \\
--arcsinh-cofactor 10.0
With Denoising¶
openimc extract-features input.mcd features.csv \\
--mask segmentation_masks/ \\
--denoise-settings denoise_config.json
With Spillover Correction¶
openimc extract-features input.mcd features.csv \\
--mask segmentation_masks/ \\
--spillover-matrix spillover_matrix.csv \\
--spillover-method nnls
Workflow YAML Example¶
feature_extraction:
enabled: true
morphological: true
intensity: true
arcsinh: false
arcsinh_cofactor: 10.0
spillover_correction:
enabled: false
matrix_file: "spillover_matrix.csv"
method: "nnls"
Method Details¶
Feature extraction uses scikit-image’s regionprops and regionprops_table functions to compute morphological features from segmentation masks. Intensity features are computed by masking each channel image with the cell segmentation mask and computing statistics.
Morphological Feature Computation¶
Morphological features are computed using scikit-image’s region properties:
Each cell in the segmentation mask is treated as a region
Region properties are computed using
regionprops_tableProperties include geometric measurements (area, perimeter, etc.) and shape descriptors (eccentricity, solidity, etc.)
Citation: - scikit-image: van der Walt, S., et al. (2014). “scikit-image: image processing in Python.” PeerJ, 2, e453. DOI: 10.7717/peerj.453 - scikit-image regionprops
Intensity Feature Computation¶
Intensity features are computed per channel:
For each channel, the image is masked with the cell segmentation mask
Pixel intensities within each cell are extracted
Statistical measures (mean, median, max, min, std, sum) are computed
Results are stored with channel name prefixes
Arcsinh Transformation¶
The arcsinh (inverse hyperbolic sine) transformation is commonly used in cytometry to stabilize variance and normalize distributions:
When applied with a cofactor:
This transformation: - Compresses high-intensity values - Expands low-intensity values - Reduces the impact of outliers - Makes data more suitable for downstream analysis
Citation: - Bendall, S. C., et al. (2011). “Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum.” Science, 332(6030), 687-696. DOI: 10.1126/science.1198704
Spillover Correction¶
Spillover correction compensates for spectral overlap between channels in IMC data. OpenIMC supports two methods:
Non-negative Least Squares (NNLS): Solves for corrected intensities that minimize error while ensuring non-negativity
Projected Gradient Descent (PGD): Iterative optimization method
The spillover matrix should be provided as a CSV file where: - Rows and columns are channel names - Values are spillover coefficients (typically 0-1) - Diagonal should be 1.0 (self-spillover)
Citation: - Implementation based on standard spillover correction methods used in flow cytometry and mass cytometry - For IMC-specific spillover correction, see: Chevrier, S., et al. (2018). “Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry.” Cell Systems, 6(5), 612-620. DOI: 10.1016/j.cels.2018.02.010
Tips and Best Practices¶
Feature Selection: Extract both morphological and intensity features for comprehensive analysis. Morphological features are useful for cell type identification, while intensity features capture marker expression.
Arcsinh Transformation: Apply arcsinh transformation if your intensity distributions are highly skewed or have a wide dynamic range. This is especially important for IMC data.
Denoising: Apply denoising before feature extraction if your images are noisy. This can improve the accuracy of intensity features.
Spillover Correction: Use spillover correction if you have a spillover matrix. This is particularly important for channels with significant spectral overlap.
Channel Exclusion: Exclude channels that are not informative (e.g., background channels, DAPI if not used for analysis) to reduce feature dimensionality.
Validation: Check the extracted features CSV to ensure: - All expected cells are present - Feature values are reasonable (no extreme outliers) - Missing values are handled appropriately
Memory Considerations: For large datasets, process acquisitions separately or use the CLI with appropriate resource allocation.