Custom Algorithms

OpenIMC provides base classes that allow developers to easily integrate novel segmentation, clustering, and feature extraction algorithms into the framework. These base classes define clear interfaces with standardized input/output formats, making it straightforward to add new methods while maintaining compatibility with the existing OpenIMC pipeline.

Overview

The base classes are located in openimc.processing.base and provide:

  • Clear interface definitions: Standardized input/output formats

  • Input validation: Automatic validation of inputs before processing

  • Output validation: Automatic validation of outputs after processing

  • Documentation: Comprehensive docstrings explaining expected formats

  • Error handling: Consistent error messages and exception types

Base Classes

BaseSegmenter

Abstract base class for segmentation algorithms.

Location: openimc.processing.base.BaseSegmenter

Expected Inputs:

  • nuclear_image: np.ndarray, shape (H, W), dtype float32 - Preprocessed nuclear channel image (0-1 normalized)

  • cyto_image: np.ndarray, shape (H, W), dtype float32, optional - Preprocessed cytoplasm channel image (0-1 normalized)

  • **kwargs: Additional algorithm-specific parameters

Expected Output:

  • mask: np.ndarray, shape (H, W), dtype uint32 - Segmentation mask where each cell has a unique integer label - 0 = background, 1+ = cell labels

Example Implementation:

from openimc.processing.base import BaseSegmenter
import numpy as np

class MyCustomSegmenter(BaseSegmenter):
    def __init__(self):
        super().__init__(name="my_custom_segmenter")

    def segment(self, nuclear_image, cyto_image=None, **kwargs):
        # Validate inputs (optional, but recommended)
        self.validate_inputs(nuclear_image, cyto_image)

        # Your segmentation algorithm here
        # ...

        # Create mask (example: simple thresholding)
        threshold = kwargs.get('threshold', 0.5)
        mask = (nuclear_image > threshold).astype(np.uint32)

        # Validate output (optional, but recommended)
        self.validate_output(mask, nuclear_image.shape)

        return mask

BaseClusterer

Abstract base class for clustering algorithms.

Location: openimc.processing.base.BaseClusterer

Expected Inputs:

  • features_df: pd.DataFrame - Feature matrix with one row per cell and one column per feature - Required columns: None (all numeric columns are used) - Excluded columns: 'cell_id', 'acquisition_id', 'acquisition_name', 'well', 'cluster', 'label', 'source_file', etc.

  • columns: List[str], optional - Specific feature columns to use for clustering - If None, auto-detects all numeric columns

  • **kwargs: Additional algorithm-specific parameters

Expected Output:

  • features_df: pd.DataFrame - Same DataFrame as input with 'cluster' column added - 'cluster' column: int, 1-based cluster labels (0 = unassigned/noise)

Example Implementation:

from openimc.processing.base import BaseClusterer
import pandas as pd
from sklearn.cluster import KMeans

class MyCustomClusterer(BaseClusterer):
    def __init__(self):
        super().__init__(name="my_custom_clusterer")

    def cluster(self, features_df, columns=None, **kwargs):
        # Validate and prepare inputs
        data, column_names = self.validate_inputs(features_df, columns)
        original_shape = features_df.shape

        # Your clustering algorithm here
        n_clusters = kwargs.get('n_clusters', 5)
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        cluster_labels = kmeans.fit_predict(data.values)

        # Convert to 1-based labels
        cluster_labels = (cluster_labels + 1).astype(int)

        # Add cluster column
        result_df = features_df.copy()
        result_df['cluster'] = cluster_labels

        # Validate output
        self.validate_output(result_df, original_shape)

        return result_df

BaseFeatureExtractor

Abstract base class for feature extraction algorithms.

Location: openimc.processing.base.BaseFeatureExtractor

Expected Inputs:

  • mask: np.ndarray, shape (H, W), dtype uint32 - Segmentation mask with cell labels (0 = background, 1+ = cells)

  • image_stack: np.ndarray, shape (H, W, C), dtype float32 - Image stack with C channels

  • channel_names: List[str], length C - Names of each channel in image_stack

  • **kwargs: Additional algorithm-specific parameters

Expected Output:

  • features_df: pd.DataFrame - Feature matrix with one row per cell - Required columns:

    • 'cell_id': int, unique identifier for each cell (1-based)

    • 'label': int, cell label from mask (1-based)

    • Additional feature columns (algorithm-specific)

Example Implementation:

from openimc.processing.base import BaseFeatureExtractor
import numpy as np
import pandas as pd

class MyCustomFeatureExtractor(BaseFeatureExtractor):
    def __init__(self):
        super().__init__(name="my_custom_extractor")

    def extract(self, mask, image_stack, channel_names, **kwargs):
        # Validate inputs
        self.validate_inputs(mask, image_stack, channel_names)

        # Get unique cell labels (exclude background = 0)
        unique_labels = np.unique(mask)
        unique_labels = unique_labels[unique_labels > 0]

        features_list = []
        for idx, label in enumerate(unique_labels):
            cell_id = idx + 1  # 1-based
            features = {'cell_id': cell_id, 'label': int(label)}

            # Create binary mask for this cell
            cell_mask = (mask == label)

            # Extract your custom features here
            # ...

            features_list.append(features)

        # Create DataFrame
        features_df = pd.DataFrame(features_list)

        # Validate output
        expected_n_cells = len(unique_labels)
        self.validate_output(features_df, expected_n_cells)

        return features_df

Integration with OpenIMC

Once you’ve implemented a custom algorithm, you can integrate it into OpenIMC by modifying the core functions to support your new algorithm. Here’s how:

Segmentation Integration

Modify openimc.core.segment() to add your segmenter:

def segment(loader, acquisition, method, ...):
    # ... existing code ...

    elif method == 'my_custom_segmenter':
        from my_module import MyCustomSegmenter

        # Preprocess channels (same as other methods)
        nuclear_img, cyto_img = _preprocess_channels_for_segmentation(...)

        # Create segmenter instance
        segmenter = MyCustomSegmenter()

        # Run segmentation
        mask = segmenter.segment(
            nuclear_img,
            cyto_image=cyto_img,
            threshold=0.5,  # Your custom parameters
            min_cell_area=50
        )

    # ... rest of code ...

Clustering Integration

Modify openimc.core.cluster() to add your clusterer:

def cluster(features_df, method='leiden', ...):
    # ... existing code ...

    elif method == 'my_custom_clusterer':
        from my_module import MyCustomClusterer

        # Create clusterer instance
        clusterer = MyCustomClusterer()

        # Run clustering
        result_df = clusterer.cluster(
            features_df,
            columns=columns,
            n_clusters=5,  # Your custom parameters
            seed=42
        )

        return result_df

    # ... rest of code ...

Feature Extraction Integration

Modify openimc.processing.feature_worker.extract_features_for_acquisition() to add your extractor:

def extract_features_for_acquisition(..., feature_extractor=None):
    # ... existing code ...

    if feature_extractor is not None:
        # Use custom extractor
        from my_module import MyCustomFeatureExtractor
        extractor = MyCustomFeatureExtractor()
        features_df = extractor.extract(
            mask,
            img_stack,
            channel_names,
            morphological=True,
            intensity=True
        )
    else:
        # Use default extractor
        # ... existing code ...

Example Implementations

Complete example implementations are available in openimc.processing.examples:

  • ExampleThresholdSegmenter: Simple thresholding-based segmentation

  • ExampleKMeansClusterer: K-means clustering implementation

  • ExampleBasicFeatureExtractor: Basic morphological and intensity features

These examples demonstrate:

  • Proper input validation

  • Correct output format

  • Error handling

  • Integration patterns

Best Practices

  1. Always validate inputs: Use the validate_inputs() method before processing

  2. Always validate outputs: Use the validate_output() method after processing

  3. Handle edge cases: Empty masks, no cells, missing channels, etc.

  4. Document parameters: Clearly document all **kwargs parameters

  5. Preserve data types: Ensure outputs match expected dtypes (uint32, float32, etc.)

  6. Use 1-based indexing: Cell IDs and labels should start at 1 (0 = background/unassigned)

  7. Handle memory efficiently: For large datasets, process in chunks if needed

  8. Provide informative errors: Raise ValueError with clear messages for invalid inputs

Common Pitfalls

  1. Wrong dtype: Segmentation masks must be uint32, not uint8 or int32

  2. Wrong indexing: Cell IDs and labels must be 1-based (1, 2, 3, …), not 0-based

  3. Missing required columns: Feature DataFrames must have 'cell_id' and 'label' columns

  4. Shape mismatches: Output shapes must match input shapes

  5. NaN values: Cluster labels cannot contain NaN (use 0 for unassigned cells)

  6. Background handling: Background pixels should be labeled as 0 in masks

Testing Your Implementation

Before integrating your algorithm, test it with the base class validation:

import numpy as np
from my_module import MyCustomSegmenter

# Create test data
nuclear_img = np.random.rand(100, 100).astype(np.float32)

# Test segmenter
segmenter = MyCustomSegmenter()
mask = segmenter.segment(nuclear_img, threshold=0.5)

# Validation is automatic, but you can also check:
assert mask.dtype == np.uint32
assert mask.shape == nuclear_img.shape
assert mask.min() >= 0

For more complex testing, use the OpenIMC test fixtures and integration tests.