impl median function for aggregation
Adds median as aggregation function to https://github.com/scverse/scanpy/blob/0f3161295dbf0cf568376c31eaa5c6e148dcf9f0/src/scanpy/get/_aggregated.py In case of a sparse matrix, it will be converted to dense matrix before calculating median
- [ ] Closes #
- [x] Tests included or not required because:
- [ ] Release notes not necessary because:
Thanks! A little bit of context. We needed this aggregation for one of the projects using pseudobulks of the data. We could use scanpy aggregation methods for simple averaging, but to test the outlier-robust median aggregation, we had to write our code. scanpy didn't have it for some reason, so @farhadmd7 kindly agreed to contribute here. Perhaps someone else will find it helpful, too.
@eroell, what do you think?
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 76.76%. Comparing base (
8b7673d) to head (e6c61a8). Report is 56 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #3180 +/- ##
==========================================
+ Coverage 76.40% 76.76% +0.35%
==========================================
Files 109 109
Lines 12529 12544 +15
==========================================
+ Hits 9573 9629 +56
+ Misses 2956 2915 -41
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/scanpy/get/_aggregated.py | 95.23% <100.00%> (+0.54%) |
:arrow_up: |
Hi, thanks for the contribution!
The converting to dense is quite iffy, we should probably add real support for sparse here. We could use this as a base. Do you think you can do that or should we help?
This is also missing release notes in 1.11.0.md
@eroell, what do you think?
See Phil's comment above, one more thing would be to add a test I suppose
If you want to try Phil's comments yourself @farhadmd7 please go ahead, else you can write here for more help as Phil mentioned!
one more thing would be to add a test I suppose
Already covered in by merging #3186. This way, adding "median" to AggType automatically tests it:
…
tests/test_aggregated.py::test_aggregate_vs_pandas[median-numpy_ndarray] PASSED [ 1%]
tests/test_aggregated.py::test_aggregate_vs_pandas[median-scipy_csr] PASSED [ 1%]
tests/test_aggregated.py::test_aggregate_vs_pandas[median-scipy_csc] PASSED [ 1%]
…
tests/test_aggregated.py::test_aggregate_axis[median-numpy_ndarray] PASSED [ 2%]
tests/test_aggregated.py::test_aggregate_axis[median-scipy_csr] PASSED [ 2%]
tests/test_aggregated.py::test_aggregate_axis[median-scipy_csc] PASSED [ 2%]
…
tests/test_aggregated.py::test_aggregate_arraytype[median-numpy_ndarray] PASSED [ 3%]
tests/test_aggregated.py::test_aggregate_arraytype[median-scipy_csr] PASSED [ 4%]
tests/test_aggregated.py::test_aggregate_arraytype[median-scipy_csc] PASSED [ 4%]
…
Can not do this with a smarter numba kernel? at least for the sparse case. I think this should remove all the issues?
Thanks to everyone who commented here.
Hey @flying-sheep, it passes the tests in https://github.com/scverse/scanpy/blob/e285c0f6ec77631d14d748d0927d38aae4391886/tests/test_aggregated.py please let me know if I should fix or refactor anything Thanks
Hi, sorry for getting back so late, I was alternatingly really busy and sick.
This looks nice! One more thing:
This is also missing release notes in 1.11.0.md
Thank you for the comment It's done