scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

impl median function for aggregation

Open farhadmd7 opened this issue 1 year ago • 6 comments

Adds median as aggregation function to https://github.com/scverse/scanpy/blob/0f3161295dbf0cf568376c31eaa5c6e148dcf9f0/src/scanpy/get/_aggregated.py In case of a sparse matrix, it will be converted to dense matrix before calculating median

  • [ ] Closes #
  • [x] Tests included or not required because:
  • [ ] Release notes not necessary because:

farhadmd7 avatar Jul 30 '24 15:07 farhadmd7

Thanks! A little bit of context. We needed this aggregation for one of the projects using pseudobulks of the data. We could use scanpy aggregation methods for simple averaging, but to test the outlier-robust median aggregation, we had to write our code. scanpy didn't have it for some reason, so @farhadmd7 kindly agreed to contribute here. Perhaps someone else will find it helpful, too.

@eroell, what do you think?

VladimirShitov avatar Jul 30 '24 15:07 VladimirShitov

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 76.76%. Comparing base (8b7673d) to head (e6c61a8). Report is 56 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3180      +/-   ##
==========================================
+ Coverage   76.40%   76.76%   +0.35%     
==========================================
  Files         109      109              
  Lines       12529    12544      +15     
==========================================
+ Hits         9573     9629      +56     
+ Misses       2956     2915      -41     
Files with missing lines Coverage Δ
src/scanpy/get/_aggregated.py 95.23% <100.00%> (+0.54%) :arrow_up:

... and 1 file with indirect coverage changes

codecov[bot] avatar Jul 30 '24 15:07 codecov[bot]

Hi, thanks for the contribution!

The converting to dense is quite iffy, we should probably add real support for sparse here. We could use this as a base. Do you think you can do that or should we help?

This is also missing release notes in 1.11.0.md

flying-sheep avatar Aug 01 '24 11:08 flying-sheep

@eroell, what do you think?

See Phil's comment above, one more thing would be to add a test I suppose

If you want to try Phil's comments yourself @farhadmd7 please go ahead, else you can write here for more help as Phil mentioned!

eroell avatar Aug 01 '24 14:08 eroell

one more thing would be to add a test I suppose

Already covered in by merging #3186. This way, adding "median" to AggType automatically tests it:

…
tests/test_aggregated.py::test_aggregate_vs_pandas[median-numpy_ndarray] PASSED [  1%]
tests/test_aggregated.py::test_aggregate_vs_pandas[median-scipy_csr] PASSED [  1%]
tests/test_aggregated.py::test_aggregate_vs_pandas[median-scipy_csc] PASSED [  1%]
…
tests/test_aggregated.py::test_aggregate_axis[median-numpy_ndarray] PASSED [  2%]
tests/test_aggregated.py::test_aggregate_axis[median-scipy_csr] PASSED   [  2%]
tests/test_aggregated.py::test_aggregate_axis[median-scipy_csc] PASSED   [  2%]
…
tests/test_aggregated.py::test_aggregate_arraytype[median-numpy_ndarray] PASSED [  3%]
tests/test_aggregated.py::test_aggregate_arraytype[median-scipy_csr] PASSED [  4%]
tests/test_aggregated.py::test_aggregate_arraytype[median-scipy_csc] PASSED [  4%]
…

flying-sheep avatar Aug 01 '24 15:08 flying-sheep

Can not do this with a smarter numba kernel? at least for the sparse case. I think this should remove all the issues?

Intron7 avatar Aug 05 '24 09:08 Intron7

Thanks to everyone who commented here.

Hey @flying-sheep, it passes the tests in https://github.com/scverse/scanpy/blob/e285c0f6ec77631d14d748d0927d38aae4391886/tests/test_aggregated.py please let me know if I should fix or refactor anything Thanks

farhadmd7 avatar Sep 02 '24 18:09 farhadmd7

Hi, sorry for getting back so late, I was alternatingly really busy and sick.

This looks nice! One more thing:

This is also missing release notes in 1.11.0.md

flying-sheep avatar Sep 13 '24 08:09 flying-sheep

Thank you for the comment It's done

farhadmd7 avatar Sep 15 '24 10:09 farhadmd7