LLM-Based Phenotyping¶

LLM-based phenotyping uses large language models to suggest cell type annotations for clusters based on marker expression statistics. This feature provides AI-assisted phenotype suggestions to help accelerate cell type annotation workflows.

Warning

LLM suggestions are purely suggestive and not definitive outcomes. Always manually review and validate all LLM suggestions before using them in your analysis. The LLM may make errors, and its suggestions should be treated as starting points for manual annotation, not as ground truth.

Overview¶

After clustering, identifying cell types (phenotyping) is a critical but time-consuming step. LLM-based phenotyping uses OpenAI’s language models to analyze marker expression patterns and suggest likely cell types for each cluster.

How it works:

Statistics Computation: For each cluster, computes marker expression statistics: - Top-K markers with highest expression (positive markers) - Top-K markers with lowest expression (negative markers) - Z-scores, log fold-changes, AUROC, and other metrics
LLM Analysis: Sends marker statistics to the selected LLM model along with: - System prompt (fine vs. broad mode) - User context (tissue type, cancer type, etc.) - Feature mode (markers only, morphometrics only, or both)
Suggestion Generation: LLM returns 3 phenotype suggestions per cluster with: - Phenotype name - Confidence percentage - Rationale explaining the suggestion
User Selection: You review and select from the suggestions for each cluster

Key Features:

Multiple Model Options: Choose from different models based on speed, accuracy, cost, and reasoning capability
Fine vs. Broad Mode: Control the level of detail in phenotype suggestions
Context-Aware: Provide tissue/cancer type context for better suggestions
Feature Selection: Choose which features to use (markers, morphometrics, or both)
Caching: Results are cached to avoid redundant API calls
Cost: Approximately $0.10 (10 cents) per query

Model Selection¶

OpenIMC supports multiple OpenAI models, each with different characteristics:

Available Models:

gpt-5.1 (default): Latest model with reasoning capabilities - Speed: Moderate (slower than smaller models) - Accuracy: Highest - Cost: Highest - Reasoning: Yes (configurable: none, low, medium, high) - Use case: Best for complex datasets requiring detailed analysis
gpt-5: Standard GPT-5 model - Speed: Moderate - Accuracy: High - Cost: High - Reasoning: No - Use case: Good balance of accuracy and speed
gpt-5-mini: Smaller, faster version - Speed: Fast - Accuracy: Good - Cost: Moderate - Reasoning: No - Use case: Faster suggestions with good accuracy
gpt-5-nano: Smallest, fastest version - Speed: Very fast - Accuracy: Moderate - Cost: Low - Reasoning: No - Use case: Quick suggestions for simple datasets
gpt-4.1: Previous generation model - Speed: Moderate - Accuracy: High - Cost: High - Reasoning: No - Use case: Alternative to GPT-5 if needed

Reasoning Level (gpt-5.1 only):

When using gpt-5.1, you can configure the reasoning effort:

none (default): No additional reasoning, fastest
low: Minimal reasoning, slightly slower
medium: Moderate reasoning, slower but more thorough
high: Maximum reasoning, slowest but most thorough

Model Selection Guidelines:

For speed: Use gpt-5-nano or gpt-5-mini
For accuracy: Use gpt-5.1 with medium/high reasoning
For cost efficiency: Use gpt-5-nano or gpt-5-mini
For complex datasets: Use gpt-5.1 with reasoning enabled
For simple datasets: Use gpt-5-mini or gpt-5-nano

Cost Considerations:

Models are charged per token (input + output)
Larger models (gpt-5.1, gpt-5) cost more per token
Reasoning increases cost and time
Estimated cost per query: Approximately $0.10 (10 cents) per cluster - Cost varies by model, reasoning level, and number of markers - Smaller models (gpt-5-nano, gpt-5-mini) may cost less - Larger models with reasoning may cost more
Monitor your OpenAI account usage to avoid unexpected charges

Inputs¶

The LLM receives several inputs to generate phenotype suggestions:

1. Marker Statistics (Computed Automatically):

For each cluster, the following statistics are computed:

Top-K Positive Markers: Markers with highest expression in the cluster - Z-score (across clusters) - Log fold-change (vs. other clusters) - AUROC (area under ROC curve) - Mean expression - Percent positive cells - Within-cluster distribution (min, mean, median, max) - Range across clusters
Top-K Negative Markers: Markers with lowest expression in the cluster - Same statistics as positive markers

2. Feature Mode:

Controls which features are included in the analysis:

Markers only: Uses only intensity features (marker expression) - Includes: _mean, _median, _std, _mad, _p10, _p90, _integrated, _frac_pos - Excludes: Morphometric features (area, perimeter, etc.) - Top-K intensity: Number of top markers to include (default: 5, range: 1-30)
Morphometrics only: Uses only morphometric features - Includes: Area, perimeter, eccentricity, solidity, etc. - Excludes: Intensity features - Top-K morphometric: Number of top morphometric features to include (default: 5, range: 1-30)
Both (default): Uses both markers and morphometrics - Includes both intensity and morphometric features - Top-K intensity: Number of top markers (default: 5) - Top-K morphometric: Number of top morphometric features (default: 5)

3. Context (Optional):

User-provided context about the dataset:

Cohort/tissue context: Free-text description - Examples: “human colorectal cancer”, “mouse liver tissue”, “breast cancer TMA” - Helps LLM provide context-appropriate suggestions - Optional but recommended for better accuracy

4. System Prompt Mode:

Controls the level of detail in suggestions:

Fine cell types (detailed) (default): Suggests specific cell types - Examples: “CD4+ T cells”, “M1 Macrophages”, “Epithelial cells” - More detailed but may be less confident - Better for well-characterized datasets
Broad cell types: Suggests broad categories first - Examples: “Lymphoid”, “Myeloid”, “Stroma”, “Tumor” - More confident but less specific - Can specify subtypes when confidence is high (>50%) - Better for exploratory analysis or uncertain datasets

5. Normalization Context:

Automatically detected from your feature extraction settings:

If arcsinh transformation was used: “intensities are arcsinh-transformed”
If raw values: “intensities are raw values”
Helps LLM interpret marker expression values correctly

Outputs¶

The LLM returns structured JSON with phenotype suggestions for each cluster:

Output Structure:

{
  "cluster_id": "1",
  "phenotype_guesses": [
    {
      "name": "CD4+ T cells",
      "rationale": "High expression of CD3, CD4, and CD45. Low expression of CD8 and CD20.",
      "confidence": 75.5
    },
    {
      "name": "Helper T cells",
      "rationale": "CD3+ CD4+ profile consistent with helper T cell lineage.",
      "confidence": 20.0
    },
    {
      "name": "T cells",
      "rationale": "General T cell markers present, but subtype uncertain.",
      "confidence": 4.5
    }
  ],
  "key_markers_positive": ["CD3", "CD4", "CD45"],
  "key_markers_negative": ["CD8", "CD20"],
  "notes": "Strong T cell signature with helper T cell characteristics."
}

Output Fields:

phenotype_guesses: Array of 3 phenotype suggestions (ranked by confidence) - name: Suggested cell type name - rationale: Explanation of why this phenotype was suggested - confidence: Confidence percentage (0-100%, must sum to 100% across all 3)
key_markers_positive: List of markers that support this phenotype - Markers with high expression in this cluster
key_markers_negative: List of markers that argue against this phenotype - Markers with low expression in this cluster
notes: Additional observations or caveats

Display in GUI:

Each cluster shows a radio button group with 3 phenotype options
Options are ranked by confidence (highest first)
Confidence percentages are displayed
Rationale is shown below each option
You can select any of the 3 options or manually edit

Fine vs. Broad Mode¶

The system prompt mode controls whether the LLM suggests detailed or broad cell types.

Fine Cell Types Mode (Default):

Goal: Suggest specific, detailed cell types
Examples: - “CD4+ T cells” - “M1 Macrophages” - “Epithelial cells” - “Endothelial cells”
Use when: - You have a well-characterized panel - You need specific cell type annotations - You have high confidence in marker specificity
Advantages: - More informative annotations - Better for publication figures - More actionable results
Disadvantages: - May be less confident - Can suggest incorrect specific types if markers are ambiguous

Broad Cell Types Mode:

Goal: Suggest broad categories first, with optional specificity
Examples: - “Lymphoid” (or “T cells” if confidence >50%) - “Myeloid” (or “Macrophages” if confidence >50%) - “Stroma” - “Tumor”
Use when: - You’re doing exploratory analysis - Your panel has limited specificity - You want more conservative suggestions
Advantages: - More confident suggestions - Less likely to be wrong - Good starting point for manual refinement
Disadvantages: - Less informative - May need manual refinement to specific types

Mode Selection Guidelines:

Start with Broad mode for exploratory analysis
Switch to Fine mode once you understand your data better
Use Fine mode for publication-ready annotations
Use Broad mode if suggestions seem too specific or uncertain

Using LLM-Based Phenotyping in the GUI¶

Complete Clustering: Ensure clustering has been run and clusters are available
Open Phenotype Annotation Dialog: - In the clustering dialog, click “Annotate Phenotypes” button - This opens the phenotype annotation dialog
Open LLM Suggestion Dialog: - Click “Suggest phenotypes with LLM…” button - This opens the LLM phenotyping dialog
Configure LLM Settings: - OpenAI API Key: Enter your OpenAI API key (required)
- Get an API key from https://openai.com/
- Key is masked for security
- Cohort/tissue context: Enter optional context (e.g., “human colorectal cancer”)
- Model: Select model (default: gpt-5.1)
- Reasoning level: Select reasoning level if using gpt-5.1 (default: none)
- System prompt: Choose “Fine cell types” or “Broad cell types”
- Feature mode: Choose “Markers only”, “Morphometrics only”, or “Both”
- Top-K intensity: Number of top markers to include (default: 5)
- Top-K morphometric: Number of top morphometric features (default: 5, only if “Both” mode)
Run Suggestions: - Click “Run Suggestion” button - Progress bar shows analysis progress - Results are cached to avoid redundant API calls
Review Suggestions: - For each cluster, review the 3 suggested phenotypes - Read the rationale for each suggestion - Check confidence percentages - Select the most appropriate option using radio buttons
Apply Names: - Click “Apply Names” button to apply selected phenotypes - Selected phenotypes are applied to clusters - You can still manually edit after applying
Close Dialog: - Click “Close” to return to clustering dialog - Cached results are preserved for future use

Caching:

Results are automatically cached based on cluster IDs and settings
If you re-run with the same clusters, cached results are displayed immediately
Cache persists until you close the clustering dialog
To refresh suggestions, re-run the analysis

Important Warnings and Limitations¶

Warning

LLM suggestions are not definitive. Always manually review and validate all suggestions before using them in your analysis.

Key Limitations:

Not Ground Truth: LLM suggestions are AI-generated and may contain errors - Always verify against known marker profiles - Cross-reference with literature or expert knowledge - Use suggestions as starting points, not final answers
Model Limitations: Language models have limitations - May not recognize novel or rare cell types - May misinterpret ambiguous marker profiles - May suggest incorrect phenotypes if markers are non-specific - Performance varies by model and dataset
Cost Considerations: API calls incur costs - Estimated cost: Approximately $0.10 (10 cents) query - Models are charged per token (input + output) - Larger models and reasoning increase costs - Monitor your OpenAI account usage - Consider using smaller models for initial exploration
Data Quality Dependency: Suggestions depend on data quality - Poor clustering leads to poor suggestions - Missing or incorrect markers affect accuracy - Normalization artifacts can mislead the LLM
Context Dependency: Suggestions are context-dependent - Provide accurate tissue/cancer type context - Context helps but doesn’t guarantee accuracy - Different contexts may yield different suggestions

Best Practices:

Always Validate: Never use LLM suggestions without manual review
Start Broad: Use broad mode for initial exploration
Refine Manually: Use suggestions as starting points for manual annotation
Check Marker Profiles: Verify suggestions against known marker profiles
Use Multiple Models: Try different models if suggestions seem off
Provide Context: Always provide accurate tissue/cancer type context
Monitor Costs: Keep track of API usage to avoid unexpected charges
Cache Results: Results are cached automatically to save costs

When to Use LLM Phenotyping:

✅ Good for: Initial exploration, getting started, brainstorming
✅ Good for: Well-characterized panels with known markers
✅ Good for: Standard cell types (T cells, B cells, macrophages, etc.)
❌ Not ideal for: Novel or rare cell types
❌ Not ideal for: Datasets with poor clustering quality
❌ Not ideal for: Panels with non-specific markers

When NOT to Use LLM Phenotyping:

If you don’t have an OpenAI API key
If you need definitive, publication-ready annotations immediately
If your dataset has very novel or rare cell types
If cost is a major concern and you have many clusters

Tips and Best Practices¶

Model Selection: - Start with gpt-5-mini for quick exploration - Use gpt-5.1 with reasoning for complex datasets - Use gpt-5-nano for cost-sensitive workflows
Mode Selection: - Use Broad mode for initial exploration - Switch to Fine mode once you understand your data - Fine mode is better for publication figures
Feature Selection: - Use “Both” mode to include all available information - Use “Markers only” if morphometrics are not informative - Adjust Top-K values based on your panel size
Context Provision: - Always provide accurate tissue/cancer type context - Include relevant treatment information if applicable - More context = better suggestions
Validation Workflow: - Review all suggestions before applying - Check marker profiles in heatmaps/differential expression plots - Compare suggestions across different models - Manually correct incorrect suggestions
Cost Management: - Estimated cost: ~$0.10 per query - Use smaller models for initial exploration (may reduce cost) - Cache results to avoid redundant API calls - Monitor your OpenAI account usage
Iterative Refinement: - Start with broad suggestions - Refine manually based on marker profiles - Re-run with fine mode for final annotations - Combine LLM suggestions with manual annotation
Quality Control: - Verify suggestions make biological sense - Check that marker profiles match suggested phenotypes - Look for consistency across similar clusters - Flag uncertain suggestions for manual review

Example Workflow¶

Initial Exploration: - Run clustering - Open LLM phenotyping dialog - Select gpt-5-mini model - Choose “Broad cell types” mode - Provide tissue context - Run suggestions
Review Suggestions: - Review all cluster suggestions - Check confidence levels - Read rationales - Identify clusters needing manual review
Refinement: - Apply broad suggestions - Manually refine uncertain clusters - Re-run with “Fine cell types” mode for specific clusters - Use gpt-5.1 with reasoning for difficult cases
Final Annotation: - Combine LLM suggestions with manual annotations - Verify all annotations in heatmaps/UMAPs - Export final annotations

Troubleshooting¶

Common Issues:

“API Key Required” Error: - Ensure you’ve entered a valid OpenAI API key - Check that the key starts with “sk-” - Verify your account has credits
“No Clusters” Error: - Run clustering first before using LLM phenotyping - Ensure clusters are available in the clustering dialog
Poor Suggestions: - Try a different model (e.g., gpt-5.1 with reasoning) - Switch between fine and broad mode - Provide more context - Check that clustering quality is good - Verify marker panel is appropriate
High Costs: - Use smaller models (gpt-5-nano, gpt-5-mini) - Disable reasoning for gpt-5.1 - Reduce Top-K values - Cache results to avoid redundant calls
Slow Performance: - Use faster models (gpt-5-nano, gpt-5-mini) - Disable reasoning - Process clusters in batches - Check internet connection
Invalid JSON Errors: - The system automatically retries with repair instructions - If persistent, try a different model - Check that your API key has access to the selected model

Getting Help:

Check OpenAI API status if experiencing connection issues
Verify your API key permissions
Review OpenAI documentation for model availability
Check OpenIMC documentation for updates

LLM-Based Phenotyping¶

Overview¶

Model Selection¶

Inputs¶

Outputs¶

Fine vs. Broad Mode¶

Using LLM-Based Phenotyping in the GUI¶

Important Warnings and Limitations¶

Tips and Best Practices¶

Example Workflow¶

Troubleshooting¶

OpenIMC

Navigation

Related Topics