pertpy
pertpy copied to clipboard
Solve TODOs and verify DIALOGUE 1
Description of changes
I have benchmarked the initial part of DIALOGUE (DIALOGUE1). I changed the _pseudobulk_feature_space function so that the user can choose the aggregation method (median or mean) and the output now has samples as rows, matching the R implementation. I also modified the _scale_data function to center, scale, and cap extreme values (with a cap of 0.01) in a way that mirrors the R functions center.matrix and cap.mat. In addition, I updated the _load function to optionally restrict the data to common samples across cell types. The output of _load is now a dataframe that is converted back to a numpy array before further processing.
Technical details
The changes make the pseudobulk and normalization steps in Python produce results that match the R version. I added an optional parameter to subset to common samples and to choose the averaging function. I also ensure that the data are converted to numpy arrays before passing them to the penalized matrix decomposition functions.
Additional context
These changes only affect the initial part of DIALOGUE (DIALOGUE1) and do not modify downstream analysis.
Codecov Report
Attention: Patch coverage is 92.00000% with 2 lines in your changes missing coverage. Please review.
Project coverage is 65.79%. Comparing base (
6a97036) to head (c27ffed).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| pertpy/tools/_dialogue.py | 92.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #714 +/- ##
==========================================
+ Coverage 63.17% 65.79% +2.61%
==========================================
Files 47 47
Lines 6110 6127 +17
==========================================
+ Hits 3860 4031 +171
+ Misses 2250 2096 -154
| Files with missing lines | Coverage Δ | |
|---|---|---|
| pertpy/tools/_dialogue.py | 38.11% <92.00%> (+24.16%) |
:arrow_up: |
So, I also have a notebook that has benchmarked the current implementation of DIALOGUE against the R in their toy example, the results look good but I have a lot of datafiles and stuff that might need some adjustment, I need to speak to @Zethson but my other PR has an operational version, very scrappy tho, for now but enough for the figures I think, in a week or so I will get back to this. Now I have to focus on Pfizer.
Thank you for all your comments Yuge :)