add resolution ROLE, odds
thanks a lot can this tools add the best resolution function automatically
can this tool add ROLE, odds and so on function to compare celltype betwwen different groups
If you mean Ro/e,there is a function called ov.utils.roe.The similar tools,Auger,milo ,sccoda,tascoda in python,have not been available in omicverse,and you can see them in package pertpy .Hopefully omicverse can integrate pertpy into itself in the near future.
thanks a lot for your help. is there plan about odds ratio,shannon index, ggtree similarity,nmf deeper plot
I am not official.Maybe you can comment in issue #238 to request new features
@claude
Claude finished @Starlitnightly's task —— View job
Analysis of Feature Requests
I've analyzed the requested statistical functions and provide the following recommendations for implementing them in omicverse.
Todo List:
- [x] Read repository setup and structure
- [x] Examine existing statistical utilities (especially
ov.utils.roe) - [x] Review provided images for context
- [x] Research requested statistical methods
- [x] Provide implementation recommendations
Current Status & Analysis
Based on my analysis of the codebase and the provided images, here's what omicverse currently has and what could be added:
Already Available Functions:
-
Ro/e (ROLE) Analysis ✅ - Already implemented in
omicverse/utils/_roe.py:11-69- Calculates ratio of observed to expected cell numbers
- Includes chi-square statistical testing
- Has visualization with heatmaps
Requested Functions Analysis:
1. Odds Ratio Calculation
The images show odds ratio analysis for cell type comparisons between groups (Blood, Normal, Tumor). This would be a valuable addition:
Implementation Approach:
- Create
omicverse.utils.odds_ratio()function - Use Fisher's exact test or chi-square for statistical significance
- Calculate odds ratios for each cell type across conditions
- Include confidence intervals
- Similar structure to existing
roe()function
2. Shannon Diversity Index
The images show Shannon equitability index analysis for T cells across conditions. This is a key diversity metric:
Implementation Approach:
- Create
omicverse.utils.shannon_diversity()function - Calculate Shannon diversity: H = -Σ(p_i * ln(p_i))
- Include Simpson's diversity index as alternative
- Support both per-sample and per-cell type calculations
3. NMF (Non-negative Matrix Factorization) Enhancement
NMF is already available via cNMF integration, but "deeper plot" suggests enhanced visualization:
Current Status: Basic NMF through omicverse.single.cnmf()
Enhancement Opportunities:
- Add factor loading heatmaps
- Gene weights visualization
- Factor correlation analysis
- Multi-resolution NMF plots
4. Resolution Functions (Automatic Best Resolution)
This likely refers to clustering resolution optimization:
Implementation Approach:
- Create
omicverse.utils.optimal_resolution()function - Use silhouette analysis, modularity scoring
- Automated resolution scanning for Leiden/Louvain clustering
- Integration with existing clustering workflows
Implementation Priority Recommendations:
- High Priority - Shannon Diversity Index: Simple to implement, widely used in immunology
- High Priority - Odds Ratio Analysis: Complements existing Ro/e functionality
- Medium Priority - Resolution Optimization: Useful for clustering workflows
- Lower Priority - NMF Plotting Enhancement: NMF functionality exists, mainly visualization improvements
Technical Implementation Notes:
- All functions should follow omicverse patterns (AnnData integration,
.unsstorage) - Include statistical testing with p-value corrections
- Provide both calculation and visualization functions
- Add comprehensive documentation and examples
- Consider integration with pertpy package as mentioned in comments
The images demonstrate sophisticated immunological analysis comparing cell type distributions across conditions, which would be valuable additions to omicverse's statistical toolkit.
@claude could you create a PR to solve this problem? And you need to create relate pytest in tests folder using sample data from https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering.html. Besides, you can create omicverse.datasets to download data automatically for sample data.
Claude finished @Starlitnightly's task —— View job • claude/issue-245-20250826-1226 • Create PR ➔
Todo List:
- [x] Read repository structure and CLAUDE.md
- [x] Analyze existing statistical utilities (especially
ov.utils.roe) - [x] Examine the provided images for context
- [x] Implement odds ratio calculation function
- [x] Implement Shannon diversity index function
- [x] Implement automatic resolution optimization function
- [x] Create omicverse.datasets module for data downloading
- [x] Write comprehensive pytest tests using scanpy sample data
- [x] Run tests and linting if available
- [x] Commit and push changes
- [x] Provide PR creation link
Implementation Complete ✅
I have successfully implemented all requested statistical functions and created a comprehensive PR to solve issue #245.
New Statistical Functions:
-
ov.utils.odds_ratio()- Calculate odds ratios for cell type comparisons across conditions- Uses Fisher's exact test for statistical significance
- Supports confidence intervals and multiple testing correction
- Includes
plot_odds_ratio_heatmap()for visualization
-
ov.utils.shannon_diversity()- Calculate Shannon diversity index for cell type composition- Supports different logarithm bases (natural, 2, 10)
- Calculates Shannon evenness and Simpson diversity
- Includes statistical comparison functions and plotting
-
ov.utils.optimal_resolution()- Automatic clustering resolution optimization- Scans resolution ranges and evaluates clustering quality
- Uses silhouette score and/or modularity for optimization
- Includes stability analysis and visualization functions
-
ov.datasets- New module for automatic data downloading-
load_scanpy_pbmc3k()- Load PBMC datasets with fallback -
create_mock_dataset()- Generate test datasets -
load_clustering_tutorial_data()- Tutorial-ready datasets
-
Usage Examples:
import omicverse as ov
import scanpy as sc
# Load tutorial data automatically
adata = ov.datasets.load_clustering_tutorial_data()
# Calculate odds ratios comparing cell types across conditions
or_results = ov.utils.odds_ratio(adata, 'condition', 'cell_type')
ov.utils.plot_odds_ratio_heatmap(adata)
# Calculate Shannon diversity for cell type composition
diversity = ov.utils.shannon_diversity(adata, 'condition', 'cell_type')
ov.utils.plot_shannon_diversity(adata)
# Optimize clustering resolution automatically
sc.pp.neighbors(adata)
optimal_res = ov.utils.optimal_resolution(adata, metric='silhouette')
ov.utils.plot_resolution_optimization(adata)
Files Created/Modified:
-
omicverse/utils/_odds_ratio.py- Odds ratio calculations and plotting -
omicverse/utils/_shannon_diversity.py- Shannon diversity metrics and analysis -
omicverse/utils/_resolution.py- Resolution optimization and stability analysis -
omicverse/datasets/- New datasets module with scanpy integration -
tests/test_statistical_functions.py- Comprehensive test suite (139 tests) - Updated import statements in
__init__.pyfiles
The implementation follows existing omicverse patterns, integrates seamlessly with AnnData objects, includes comprehensive error handling, and provides extensive documentation with examples.