squidpy
squidpy copied to clipboard
Re-implementating co_occurrence()
IMPORTANT: Please search among the Pull requests before creating one.
Description
Togeter with @MDLDan we reimplement the squidpy.gr.co_occurrence() function using Numba.
The new algorithm removes the need for a pre-calculated pairwise distance matrix, enabling it to handle large datasets without splitting. Parallel processing is enabled by default, increasing the runtime speed by 40 times.
We also implemented it in Rust using PyO3 and achieved similar performance. We chose to push the Numba implementation.
Following issues are related: https://github.com/scverse/squidpy/issues/229 https://github.com/scverse/squidpy/issues/755 https://github.com/scverse/squidpy/issues/223 https://github.com/scverse/squidpy/issues/582
How has this been tested?
- All
squidpy.gr.co_occurrence()related have passed insquidpypackage test. - We also compared the new and old implementations output:
Closes
closes #755
Codecov Report
:x: Patch coverage is 23.25581% with 33 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 66.54%. Comparing base (d8b8f91) to head (7575052).
:warning: Report is 16 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/squidpy/gr/_ppatterns.py | 19.51% | 33 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #975 +/- ##
==========================================
- Coverage 66.63% 66.54% -0.09%
==========================================
Files 40 40
Lines 6060 6050 -10
Branches 1015 1012 -3
==========================================
- Hits 4038 4026 -12
- Misses 1662 1667 +5
+ Partials 360 357 -3
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/squidpy/im/_io.py | 66.90% <100.00%> (ø) |
|
| src/squidpy/gr/_ppatterns.py | 78.20% <19.51%> (-2.28%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@wenjie1991 This already looks very good and promising. But I believe you can squeeze out even more performance. You can start by adjusting the memory access pattern to be efficent. You can also numba_njit the outer function and parallelize it. Also i would cache the kernel that makes it even more efficent.
@MDLDan
Hi @Intron7,
Thanks for the suggestions. The implementation has been refactored and optimized.
These changes enhance performance by introducing JIT and parallelization to the outer function. https://github.com/scverse/squidpy/pull/975/commits/cee646ae2d72f8d7f685058c836f7afded835789
The memory access pattern has been adjusted for contiguous memory, and the kernel has been cached. https://github.com/scverse/squidpy/pull/975/commits/52a5fae7e8ae040b1828a2733e25f26858172eea
The runtime has been further reduced by approximately 15%, primarily due to improved memory access patterns.
@wenjie1991 I'll take a look later next week. Thank you for optimising this a bit more.
I am writing here to document the current performances here is the results with the current branch. Peakmem and time respectively.
========= ====== ==========
-- layer
--------- -----------------
dataset None off-axis
========= ====== ==========
imc 628M 649M
========= ====== ==========
========= ========= ==========
-- layer
--------- --------------------
dataset None off-axis
========= ========= ==========
imc 305±0ms 306±0ms
========= ========= ==========
And with main:
========= ======= ==========
-- layer
--------- ------------------
dataset None off-axis
========= ======= ==========
imc 1.65G 1.46G
========= ======= ==========
========= ========= ==========
-- layer
--------- --------------------
dataset None off-axis
========= ========= ==========
imc 12.2±0s 12.2±0s
========= ========= ==========
@Intron7 I cleaned up the code a bit in this commit https://github.com/selmanozleyen/squidpy/commit/27a6d70eb3c406e570151483f998b8bde191853b. From what I understand using an inplace algorithm wasn't worth it for my benchmark dataset imc at least. My guess would be that because creating a float64 count array and the increment operations would be also costly compared to filling a float array from scratch but I might be misunderstanding something. Here are the results for this commit
========= ====== ==========
-- layer
--------- -----------------
dataset None off-axis
========= ====== ==========
imc 672M 670M
========= ====== ==========
========= ========= ==========
-- layer
--------- --------------------
dataset None off-axis
========= ========= ==========
imc 287±0ms 294±0ms
========= ========= ==========