pertpy icon indicating copy to clipboard operation
pertpy copied to clipboard

Solve TODOs and verify DIALOGUE 1

Open grpinto opened this issue 9 months ago • 2 comments
trafficstars

Description of changes

I have benchmarked the initial part of DIALOGUE (DIALOGUE1). I changed the _pseudobulk_feature_space function so that the user can choose the aggregation method (median or mean) and the output now has samples as rows, matching the R implementation. I also modified the _scale_data function to center, scale, and cap extreme values (with a cap of 0.01) in a way that mirrors the R functions center.matrix and cap.mat. In addition, I updated the _load function to optionally restrict the data to common samples across cell types. The output of _load is now a dataframe that is converted back to a numpy array before further processing.

Technical details

The changes make the pseudobulk and normalization steps in Python produce results that match the R version. I added an optional parameter to subset to common samples and to choose the averaging function. I also ensure that the data are converted to numpy arrays before passing them to the penalized matrix decomposition functions.

Additional context

These changes only affect the initial part of DIALOGUE (DIALOGUE1) and do not modify downstream analysis.

grpinto avatar Feb 21 '25 17:02 grpinto

Codecov Report

Attention: Patch coverage is 92.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 65.79%. Comparing base (6a97036) to head (c27ffed).

Files with missing lines Patch % Lines
pertpy/tools/_dialogue.py 92.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #714      +/-   ##
==========================================
+ Coverage   63.17%   65.79%   +2.61%     
==========================================
  Files          47       47              
  Lines        6110     6127      +17     
==========================================
+ Hits         3860     4031     +171     
+ Misses       2250     2096     -154     
Files with missing lines Coverage Δ
pertpy/tools/_dialogue.py 38.11% <92.00%> (+24.16%) :arrow_up:

... and 3 files with indirect coverage changes

codecov-commenter avatar Feb 22 '25 09:02 codecov-commenter

So, I also have a notebook that has benchmarked the current implementation of DIALOGUE against the R in their toy example, the results look good but I have a lot of datafiles and stuff that might need some adjustment, I need to speak to @Zethson but my other PR has an operational version, very scrappy tho, for now but enough for the figures I think, in a week or so I will get back to this. Now I have to focus on Pfizer.

Thank you for all your comments Yuge :)

grpinto avatar Mar 19 '25 17:03 grpinto