Core pipeline API

Core operations for OpenIMC.

This module provides unified core operations that can be used by both the GUI and CLI interfaces, ensuring exact parity between them.

openimc.core.batch_correction(features_df, method='harmony', batch_var=None, features=None, output_path=None, covariates=None, n_clusters=30, sigma=0.1, theta=2.0, lambda_reg=1.0, max_iter=20, pca_variance=0.9)[source]

Apply batch correction to feature data.

This function applies batch correction using ComBat or Harmony to remove technical variation (batch effects) between different files or batches.

Parameters:
  • features_df (DataFrame) – DataFrame with cell features

  • method (str) – Batch correction method (‘combat’ or ‘harmony’)

  • batch_var (Optional[str]) – Column name containing batch identifiers. If None, auto-detects

  • features (Optional[List[str]]) – List of feature column names to correct. If None, auto-detects

  • output_path (Union[str, Path, None]) – Optional path to save corrected features CSV

  • covariates (Optional[List[str]]) – Optional list of covariate column names (ComBat only)

  • n_clusters (int) – Number of Harmony clusters (default: 30)

  • sigma (float) – Width of soft kmeans clusters for Harmony (default: 0.1)

  • theta (float) – Diversity clustering penalty parameter for Harmony (default: 2.0)

  • lambda_reg (float) – Regularization parameter for Harmony (default: 1.0)

  • max_iter (int) – Maximum number of iterations for Harmony (default: 20)

  • pca_variance (float) – Proportion of variance to retain in PCA for Harmony (default: 0.9)

Return type:

DataFrame

Returns:

DataFrame with corrected features (all original columns preserved)

openimc.core.build_spatial_graph(features_df, method='kNN', k_neighbors=6, radius=None, pixel_size_um=1.0, roi_column=None, detect_communities=False, community_seed=42, output_path=None)[source]

Build spatial graph from cell centroids.

This function creates a spatial graph connecting cells based on their spatial proximity. It supports kNN, radius-based, and Delaunay triangulation methods. The graph can be built per-ROI (if roi_column is provided) or globally.

Parameters:
  • features_df (DataFrame) – DataFrame with cell features, must contain ‘centroid_x’ and ‘centroid_y’

  • method (str) – Graph construction method (‘kNN’, ‘Radius’, or ‘Delaunay’)

  • k_neighbors (int) – Number of neighbors for kNN method

  • radius (Optional[float]) – Radius in pixels for radius-based method (required if method=’Radius’)

  • pixel_size_um (float) – Pixel size in micrometers for distance conversion

  • roi_column (Optional[str]) – Column name for ROI grouping (e.g., ‘acquisition_id’). If None, builds global graph

  • detect_communities (bool) – Whether to detect communities using Leiden algorithm

  • community_seed (int) – Random seed for community detection

  • output_path (Union[str, Path, None]) – Optional path to save edges CSV file

Return type:

Tuple[DataFrame, Optional[DataFrame]]

Returns:

Tuple of (edges_df, features_with_communities_df) - edges_df: DataFrame with columns [‘roi_id’, ‘cell_id_A’, ‘cell_id_B’, ‘distance_um’] (or [‘source’, ‘target’, ‘distance’, ‘distance_um’] for global) - features_with_communities_df: DataFrame with ‘spatial_community’ column if detect_communities=True, else None

openimc.core.build_spatial_graph_anndata(features_df, method='kNN', k_neighbors=20, radius=None, pixel_size_um=1.0, roi_column=None, roi_id=None, seed=42)[source]

Build spatial graph using squidpy and return AnnData objects per ROI.

This function creates AnnData objects with spatial graphs built using squidpy. It’s the unified function used by both GUI and CLI for AnnData-based spatial analysis.

Parameters:
  • features_df (DataFrame) – DataFrame with cell features, must contain ‘centroid_x’ and ‘centroid_y’

  • method (str) – Graph construction method (‘kNN’, ‘Radius’, or ‘Delaunay’)

  • k_neighbors (int) – Number of neighbors for kNN method (default: 20)

  • radius (Optional[float]) – Radius in micrometers for radius-based method (required if method=’Radius’)

  • pixel_size_um (float) – Pixel size in micrometers for coordinate conversion (default: 1.0)

  • roi_column (Optional[str]) – Column name for ROI grouping (e.g., ‘acquisition_id’). Auto-detected if None

  • roi_id (Optional[str]) – Optional specific ROI to process. If None, processes all ROIs

  • seed (int) – Random seed for reproducibility (default: 42)

Return type:

Dict[str, AnnData]

Returns:

Dictionary mapping ROI ID to AnnData object with spatial graph built

Raises:
  • ImportError – If squidpy or anndata are not installed

  • ValueError – If method is invalid or required parameters are missing

openimc.core.cluster(features_df, method='leiden', columns=None, scaling='zscore', output_path=None, n_clusters=None, linkage='ward', resolution=1.0, seed=42, n_neighbors=15, metric='euclidean', use_jaccard=False, n_init=10, min_cluster_size=10, min_samples=5, cluster_selection_method='eom', hdbscan_metric='euclidean')[source]

Perform clustering on feature data.

This is the unified clustering function used by both GUI and CLI.

Parameters:
  • features_df (DataFrame) – DataFrame with features to cluster

  • method (str) – Clustering method (“hierarchical”, “leiden”, “louvain”, “kmeans”, or “hdbscan”)

  • columns (Optional[List[str]]) – List of column names to use for clustering (auto-detect if None)

  • scaling (str) – Scaling method (“none”, “zscore”, or “mad”)

  • output_path (Union[str, Path, None]) – Optional path to save clustered features CSV

  • n_clusters (Optional[int]) – Number of clusters (required for hierarchical)

  • linkage (str) – Linkage method for hierarchical clustering (“ward”, “complete”, “average”)

  • resolution (float) – Resolution parameter for Leiden clustering

  • seed (int) – Random seed for reproducibility

  • n_neighbors (int) – Number of neighbors for k-NN graph construction (Leiden/Louvain only, default: 15)

  • metric (str) – Distance metric for k-NN graph (Leiden/Louvain only, default: “euclidean”)

  • use_jaccard (bool) – Use Jaccard similarity for edge weights instead of inverse distance (PhenoGraph-like, default: False)

  • n_init (int) – Number of initializations for K-means (default: 10)

  • min_cluster_size (int) – Minimum cluster size for HDBSCAN (default: 10)

  • min_samples (int) – Minimum samples for HDBSCAN (default: 5)

  • cluster_selection_method (str) – Cluster selection method for HDBSCAN (“eom” or “leaf”, default: “eom”)

  • hdbscan_metric (str) – Distance metric for HDBSCAN (default: “euclidean”)

Return type:

DataFrame

Returns:

DataFrame with cluster labels added in ‘cluster’ column

Raises:

ValueError – If method is invalid or required parameters are missing

openimc.core.dataframe_to_anndata(df, roi_id=None, roi_column='acquisition_id', pixel_size_um=1.0)[source]

Convert OpenIMC DataFrame to AnnData format for squidpy analysis.

This is the unified function used by both GUI and CLI.

Parameters:
  • df (DataFrame) – Feature dataframe with cells as rows

  • roi_id (Optional[str]) – Optional ROI identifier to filter to a single ROI

  • roi_column (str) – Column name for ROI identifier

  • pixel_size_um (float) – Pixel size in micrometers for coordinate conversion

Return type:

Optional[AnnData]

Returns:

AnnData object with spatial coordinates and features, or None if conversion fails

openimc.core.deconvolution(loader, acquisition, output_dir, x0=7.0, iterations=4, output_format='float', loader_path=None, source_file_path=None, unique_acq_id=None, passes=None, contributions=None, kernel=None, passes_arr=None, contribs_arr=None, kernel_dim=None, region_data_full=None, I0=None)[source]

Apply Richardson-Lucy deconvolution to high resolution IMC images.

This function applies deconvolution optimized for high resolution IMC images with step sizes of 333 nm and 500 nm.

Parameters:
  • loader (Union[MCDLoader, OMETIFFLoader]) – Data loader (MCDLoader or OMETIFFLoader)

  • acquisition (AcquisitionInfo) – Acquisition information

  • output_dir (Union[str, Path]) – Output directory for deconvolved images

  • x0 (float) – Parameter for kernel calculation (default: 7.0)

  • iterations (int) – Number of Richardson-Lucy iterations (default: 4)

  • output_format (str) – Output format (‘float’ or ‘uint16’, default: ‘float’)

  • loader_path (Union[str, Path, None]) – Optional explicit path to loader file/directory (if loader doesn’t have file_path/directory attribute)

  • source_file_path (Union[str, Path, None]) – Optional source file path for filename generation (defaults to loader_path)

  • unique_acq_id (Optional[str]) – Optional unique acquisition ID for filename generation (defaults to acquisition.id)

Return type:

Path

Returns:

Path to deconvolved OME-TIFF file

openimc.core.export_anndata(anndata_dict, output_path, combined=True)[source]

Export AnnData objects to file(s).

This is the unified export function used by both GUI and CLI.

Parameters:
  • anndata_dict (Dict[str, AnnData]) – Dictionary mapping ROI ID to AnnData object

  • output_path (Union[str, Path]) – Path to output file (if combined=True) or directory (if combined=False)

  • combined (bool) – If True, export as single combined file. If False, export separate files per ROI

Return type:

Path

Returns:

Path to exported file(s)

openimc.core.extract_features(loader, acquisitions, mask_path, output_path=None, morphological=True, intensity=True, denoise_settings=None, arcsinh=False, arcsinh_cofactor=1.0, spillover_config=None, excluded_channels=None, selected_features=None)[source]

Extract features from segmented cells.

This is the unified feature extraction function used by both GUI and CLI.

Parameters:
  • loader (Union[MCDLoader, OMETIFFLoader]) – MCDLoader or OMETIFFLoader instance

  • acquisitions (List[AcquisitionInfo]) – List of AcquisitionInfo objects to process

  • mask_path (Union[str, Path]) – Path to mask directory or single mask file

  • output_path (Union[str, Path, None]) – Optional path to save CSV (if None, features are not saved)

  • morphological (bool) – Whether to extract morphological features

  • intensity (bool) – Whether to extract intensity features

  • denoise_settings (Optional[Dict]) – Dictionary with denoise settings per channel (optional)

  • arcsinh (bool) – Whether to apply arcsinh transformation to intensity features

  • arcsinh_cofactor (float) – Arcsinh cofactor

  • spillover_config (Optional[Dict]) – Optional spillover correction configuration

  • excluded_channels (Optional[set]) – Optional set of channel names to exclude

  • selected_features (Optional[Dict[str, bool]]) – Optional custom feature selection dict (overrides morphological/intensity)

Return type:

DataFrame

Returns:

DataFrame with extracted features

openimc.core.generate_spillover_matrix(mcd_path, donor_label_per_acq=None, cap=0.3, aggregate='median', channel_name_field='name', p_low=90.0, p_high_clip=99.9, output_path=None)[source]

Generate spillover matrix from single-stain control MCD file.

This function analyzes pixel-level data from single-stain control acquisitions to estimate spillover coefficients between channels.

Parameters:
  • mcd_path (Union[str, Path]) – Path to MCD file containing single-stain controls

  • donor_label_per_acq (Optional[Dict[str, str]]) – Mapping from acquisition ID/index to donor channel name

  • cap (float) – Maximum spillover coefficient (default: 0.3)

  • aggregate (str) – Aggregation method when multiple acquisitions per donor (‘median’ or ‘mean’)

  • channel_name_field (str) – Field to use for channel names (‘name’ or ‘fullname’, default: ‘name’)

  • p_low (float) – Lower percentile for foreground selection (default: 90.0)

  • p_high_clip (float) – Upper percentile for clipping (default: 99.9)

  • output_path (Union[str, Path, None]) – Optional path to save spillover matrix CSV

Return type:

Tuple[DataFrame, DataFrame]

Returns:

Tuple of (spillover_matrix_df, qc_metrics_df)

openimc.core.get_panel(acq_info, output_path)[source]

Generate a panel.csv file from acquisition information.

Creates a CSV file with two columns: - channel: Metal tag/channel identifier - name: Channel name/label

Parameters:
  • acq_info (AcquisitionInfo) – AcquisitionInfo object containing channel metadata

  • output_path (Union[str, Path]) – Path where panel.csv will be saved

Return type:

Path

Returns:

Path to the created panel.csv file

Raises:

ValueError – If channel_metals and channel_labels are empty or mismatched

openimc.core.load_mcd(input_path, channel_format='CHW')[source]

Load data from MCD file or OME-TIFF directory.

This is the unified data loading function used by both GUI and CLI.

Parameters:
  • input_path (Union[str, Path]) – Path to MCD file or OME-TIFF directory

  • channel_format (str) – Format for OME-TIFF files (‘CHW’ or ‘HWC’), default is ‘CHW’

Return type:

Tuple[Union[MCDLoader, OMETIFFLoader], str]

Returns:

Tuple of (loader, loader_type) where loader_type is ‘mcd’ or ‘ometiff’

Raises:

ValueError – If input path is invalid or unsupported format

openimc.core.parse_denoise_settings(denoise_json)[source]

Parse denoise settings from JSON string, file, or dict.

Parameters:

denoise_json (Union[str, Dict, None]) – JSON string, path to JSON file, or dict with denoise settings

Return type:

Dict

Returns:

Dictionary with denoise settings per channel

openimc.core.pixel_correlation(loader, acquisition, channels, mask=None, multiple_testing_correction=None)[source]

Compute pixel-level correlations between marker pairs.

This function computes Spearman correlation coefficients for all pairs of markers at the pixel level. Can analyze within cell masks or entire ROI.

Parameters:
  • loader (Union[MCDLoader, OMETIFFLoader]) – Data loader (MCDLoader or OMETIFFLoader)

  • acquisition (AcquisitionInfo) – Acquisition information

  • channels (List[str]) – List of channel names to analyze

  • mask (Optional[ndarray]) – Optional segmentation mask. If provided, only pixels within cells are analyzed

  • multiple_testing_correction (Optional[str]) – Optional correction method (‘bonferroni’, ‘fdr_bh’, etc.) If provided, applies correction to p-values

Returns:

marker1, marker2, correlation, p_value, n_pixels

Return type:

DataFrame with columns

openimc.core.preprocess(loader, acquisition, output_dir, denoise_settings=None, normalization_method='None', arcsinh_cofactor=1.0, percentile_params=(1.0, 99.0), viewer_denoise_func=None)[source]

Preprocess a single acquisition: apply denoising and export to OME-TIFF.

Note: arcsinh normalization is not applied to exported images by default. Only denoising is applied. Arcsinh transform should be applied on extracted intensity features.

Parameters:
  • loader (Union[MCDLoader, OMETIFFLoader]) – MCDLoader or OMETIFFLoader instance

  • acquisition (AcquisitionInfo) – AcquisitionInfo for the acquisition to process

  • output_dir (Union[str, Path]) – Directory to save the processed OME-TIFF file

  • denoise_settings (Optional[Dict]) – Dictionary with denoise settings per channel (optional)

  • normalization_method (str) – Normalization method (“None”, “arcsinh”, “percentile_clip”, “channelwise_minmax”)

  • arcsinh_cofactor (float) – Arcsinh cofactor (only used if normalization_method is “arcsinh”)

  • percentile_params (Tuple[float, float]) – Tuple of (low, high) percentiles for percentile_clip normalization

  • viewer_denoise_func (Optional[callable]) – Optional function for viewer-based denoising (GUI only)

Return type:

Path

Returns:

Path to the saved OME-TIFF file

openimc.core.qc_analysis(loader, acquisition, channels, mode='pixel', mask=None)[source]

Perform quality control analysis on IMC data.

This function calculates QC metrics including SNR (Signal-to-Noise Ratio), intensity statistics, and coverage metrics. Can analyze at pixel level or cell level (if mask is provided).

Parameters:
  • loader (Union[MCDLoader, OMETIFFLoader]) – Data loader (MCDLoader or OMETIFFLoader)

  • acquisition (AcquisitionInfo) – Acquisition information

  • channels (List[str]) – List of channel names to analyze

  • mode (str) – Analysis mode (‘pixel’ or ‘cell’)

  • mask (Optional[ndarray]) – Optional segmentation mask (required for ‘cell’ mode)

Return type:

DataFrame

Returns:

DataFrame with QC metrics per channel

openimc.core.segment(loader, acquisition, method, nuclear_channels, cyto_channels=None, output_dir=None, denoise_settings=None, normalization_method='None', arcsinh_cofactor=1.0, percentile_params=(1.0, 99.0), nuclear_combo_method='mean', cyto_combo_method='mean', nuclear_weights=None, cyto_weights=None, cellpose_model='cyto3', diameter=None, flow_threshold=0.4, cellprob_threshold=0.0, gpu_id=None, deepcell_api_key=None, bbox_threshold=0.4, use_wsi=False, low_contrast_enhancement=False, gauge_cell_size=False, min_cell_area=100, max_cell_area=10000, compactness=0.01)[source]

Segment cells using CellSAM, Cellpose, or Watershed method.

This is the unified segmentation function used by both GUI and CLI.

Parameters:
  • loader (Union[MCDLoader, OMETIFFLoader]) – MCDLoader or OMETIFFLoader instance

  • acquisition (AcquisitionInfo) – AcquisitionInfo for the acquisition to segment

  • method (str) – Segmentation method (“cellsam”, “cellpose”, or “watershed”)

  • nuclear_channels (List[str]) – List of nuclear channel names (required)

  • cyto_channels (Optional[List[str]]) – List of cytoplasm channel names (optional, required for watershed and cyto3 model)

  • output_dir (Union[str, Path, None]) – Optional directory to save mask (if None, mask is not saved)

  • denoise_settings (Optional[Dict]) – Dictionary with denoise settings per channel (optional)

  • normalization_method (str) – Normalization method (“None”, “arcsinh”, “percentile_clip”, “channelwise_minmax”)

  • arcsinh_cofactor (float) – Arcsinh cofactor (only used if normalization_method is “arcsinh”)

  • percentile_params (Tuple[float, float]) – Tuple of (low, high) percentiles for percentile_clip normalization

  • nuclear_combo_method (str) – Method to combine nuclear channels

  • cyto_combo_method (str) – Method to combine cytoplasm channels

  • nuclear_weights (Optional[List[float]]) – Optional weights for nuclear channels

  • cyto_weights (Optional[List[float]]) – Optional weights for cytoplasm channels

  • cellpose_model (str) – Cellpose model type (“cyto3” or “nuclei”)

  • diameter (Optional[int]) – Cell diameter in pixels (Cellpose, optional)

  • flow_threshold (float) – Flow threshold (Cellpose)

  • cellprob_threshold (float) – Cell probability threshold (Cellpose)

  • gpu_id (Union[int, str, None]) – GPU ID to use (Cellpose, optional)

  • deepcell_api_key (Optional[str]) – DeepCell API key (CellSAM, optional, can use DEEPCELL_ACCESS_TOKEN env var)

  • bbox_threshold (float) – Bbox threshold for CellSAM

  • use_wsi (bool) – Use WSI mode for CellSAM

  • low_contrast_enhancement (bool) – Enable low contrast enhancement for CellSAM

  • gauge_cell_size (bool) – Enable gauge cell size for CellSAM

  • min_cell_area (int) – Minimum cell area in pixels (watershed)

  • max_cell_area (int) – Maximum cell area in pixels (watershed)

  • compactness (float) – Watershed compactness

Return type:

ndarray

Returns:

Segmentation mask as numpy array (uint32)

Raises:
  • ValueError – If method is invalid or required channels are missing

  • ImportError – If required dependencies are not installed

openimc.core.spatial_autocorrelation(anndata_dict, markers=None, aggregation='mean')[source]

Compute spatial autocorrelation (Moran’s I) using squidpy.

Parameters:
  • anndata_dict (Dict[str, AnnData]) – Dictionary mapping ROI ID to AnnData object with spatial graph

  • markers (Optional[List[str]]) – Optional list of marker names to analyze. If None, analyzes all features

  • aggregation (str) – Aggregation method for multiple ROIs (“mean” or “sum”, default: “mean”)

Returns:

  • ‘results’: Dict mapping ROI ID to AnnData object with autocorrelation results

  • ’aggregated’: Aggregated results (if multiple ROIs)

Return type:

Dictionary with

openimc.core.spatial_cooccurrence(anndata_dict, cluster_key='cluster', interval=[10, 20, 30, 50, 100], reference_cluster=None)[source]

Compute co-occurrence analysis using squidpy.

Parameters:
  • anndata_dict (Dict[str, AnnData]) – Dictionary mapping ROI ID to AnnData object with spatial graph

  • cluster_key (str) – Column name containing cluster labels (default: “cluster”)

  • interval (List[float]) – List of distances in micrometers for co-occurrence analysis

  • reference_cluster (Optional[str]) – Optional reference cluster for co-occurrence

Return type:

Dict[str, AnnData]

Returns:

Dictionary mapping ROI ID to AnnData object with co-occurrence results

openimc.core.spatial_distance_distribution(features_df, edges_df, cluster_column='cluster', roi_column=None, output_path=None, pixel_size_um=1.0, n_workers=None)[source]

Compute distance distributions between clusters.

This function computes nearest neighbor distances from each cell to each cluster type using multiprocessing at ROI level for efficiency.

Parameters:
  • features_df (DataFrame) – DataFrame with cell features and cluster labels

  • edges_df (DataFrame) – DataFrame with spatial graph edges (must have ‘cell_id_A’, ‘cell_id_B’, ‘roi_id’)

  • cluster_column (str) – Column name containing cluster labels

  • roi_column (Optional[str]) – Column name for ROI grouping (auto-detected if None)

  • output_path (Union[str, Path, None]) – Optional path to save distance distribution results CSV

  • pixel_size_um (float) – Pixel size in micrometers for coordinate conversion (default: 1.0)

  • n_workers (Optional[int]) – Number of parallel workers (default: None = use all available CPUs - 2)

Return type:

DataFrame

Returns:

DataFrame with distance data (cell_A_id, cell_A_cluster, nearest_B_cluster, nearest_B_dist_um, etc.)

openimc.core.spatial_enrichment(features_df, edges_df, cluster_column='cluster', n_permutations=100, seed=42, roi_column=None, output_path=None, n_workers=None)[source]

Compute pairwise spatial enrichment between clusters.

This function computes enrichment of spatial interactions between cluster pairs using permutation-based null distribution with multiprocessing support.

Parameters:
  • features_df (DataFrame) – DataFrame with cell features and cluster labels

  • edges_df (DataFrame) – DataFrame with spatial graph edges (must have ‘cell_id_A’, ‘cell_id_B’, ‘roi_id’)

  • cluster_column (str) – Column name containing cluster labels

  • n_permutations (int) – Number of permutations for null distribution (default: 100)

  • seed (int) – Random seed for reproducibility (default: 42)

  • roi_column (Optional[str]) – Column name for ROI grouping (auto-detected if None)

  • output_path (Union[str, Path, None]) – Optional path to save enrichment results CSV

  • n_workers (Optional[int]) – Number of parallel workers (default: None = use all available CPUs - 2)

Return type:

DataFrame

Returns:

DataFrame with enrichment results (cluster_A, cluster_B, observed, expected, p_value, z_score, etc.)

openimc.core.spatial_neighborhood_enrichment(anndata_dict, cluster_key='cluster', aggregation='mean', significance_threshold=2.0)[source]

Compute neighborhood enrichment using squidpy.

This function computes neighborhood enrichment for each ROI and optionally aggregates results.

Parameters:
  • anndata_dict (Dict[str, AnnData]) – Dictionary mapping ROI ID to AnnData object with spatial graph

  • cluster_key (str) – Column name containing cluster labels (default: “cluster”)

  • aggregation (str) – Aggregation method for multiple ROIs (“mean” or “sum”, default: “mean”)

  • significance_threshold (float) – Z-score threshold for significant interactions (default: 2.0)

Returns:

  • ‘results’: Dict mapping ROI ID to enrichment results

  • ’aggregated’: Aggregated enrichment matrix (if multiple ROIs)

  • ’cluster_categories’: List of cluster categories

  • ’significant_counts’: Matrix counting ROIs with significant interactions per cluster pair

Return type:

Dictionary with

openimc.core.spatial_ripley(anndata_dict, cluster_key='cluster', mode='L', max_dist=50.0)[source]

Compute Ripley functions using squidpy.

Parameters:
  • anndata_dict (Dict[str, AnnData]) – Dictionary mapping ROI ID to AnnData object with spatial graph

  • cluster_key (str) – Column name containing cluster labels (default: “cluster”)

  • mode (str) – Ripley function mode (“F”, “G”, or “L”, default: “L”)

  • max_dist (float) – Maximum distance in micrometers (default: 50.0)

Return type:

Dict[str, AnnData]

Returns:

Dictionary mapping ROI ID to AnnData object with Ripley results

openimc.core.spillover_correction(features_df, spillover_matrix, method='pgd', arcsinh_cofactor=None, channel_map=None, output_path=None)[source]

Apply spillover correction to feature data.

This function applies CATALYST-like spillover compensation to remove spectral overlap between channels in IMC data.

Parameters:
  • features_df (DataFrame) – DataFrame with cell features (cells x channels)

  • spillover_matrix (Union[str, Path, DataFrame]) – Path to spillover matrix CSV or DataFrame

  • method (str) – Compensation method (‘nnls’ or ‘pgd’, default: ‘pgd’)

  • arcsinh_cofactor (Optional[float]) – Optional cofactor for arcsinh transformation

  • channel_map (Optional[Dict[str, str]]) – Optional mapping from feature column names to spillover matrix channel names

  • output_path (Union[str, Path, None]) – Optional path to save corrected features CSV

Return type:

DataFrame

Returns:

DataFrame with corrected features