squidpy icon indicating copy to clipboard operation
squidpy copied to clipboard

Re-implementating co_occurrence()

Open wenjie1991 opened this issue 8 months ago • 5 comments
trafficstars

IMPORTANT: Please search among the Pull requests before creating one.

Description

Togeter with @MDLDan we reimplement the squidpy.gr.co_occurrence() function using Numba. The new algorithm removes the need for a pre-calculated pairwise distance matrix, enabling it to handle large datasets without splitting. Parallel processing is enabled by default, increasing the runtime speed by 40 times. image

We also implemented it in Rust using PyO3 and achieved similar performance. We chose to push the Numba implementation.

Following issues are related: https://github.com/scverse/squidpy/issues/229 https://github.com/scverse/squidpy/issues/755 https://github.com/scverse/squidpy/issues/223 https://github.com/scverse/squidpy/issues/582

How has this been tested?

  • All squidpy.gr.co_occurrence() related have passed in squidpy package test.
  • We also compared the new and old implementations output:
    • Until the number of cells do not require squidpy to split the differences are in the 1e-08 range.
    • When the number of cells requires squidpy to spit differences are in the order of 1e-02 see ( .

Closes

closes #755

wenjie1991 avatar Mar 19 '25 09:03 wenjie1991

Codecov Report

:x: Patch coverage is 23.25581% with 33 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 66.54%. Comparing base (d8b8f91) to head (7575052). :warning: Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
src/squidpy/gr/_ppatterns.py 19.51% 33 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #975      +/-   ##
==========================================
- Coverage   66.63%   66.54%   -0.09%     
==========================================
  Files          40       40              
  Lines        6060     6050      -10     
  Branches     1015     1012       -3     
==========================================
- Hits         4038     4026      -12     
- Misses       1662     1667       +5     
+ Partials      360      357       -3     
Files with missing lines Coverage Δ
src/squidpy/im/_io.py 66.90% <100.00%> (ø)
src/squidpy/gr/_ppatterns.py 78.20% <19.51%> (-2.28%) :arrow_down:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Mar 19 '25 15:03 codecov-commenter

@wenjie1991 This already looks very good and promising. But I believe you can squeeze out even more performance. You can start by adjusting the memory access pattern to be efficent. You can also numba_njit the outer function and parallelize it. Also i would cache the kernel that makes it even more efficent.

Intron7 avatar Mar 21 '25 17:03 Intron7

@MDLDan

wenjie1991 avatar Mar 24 '25 12:03 wenjie1991

Hi @Intron7,

Thanks for the suggestions. The implementation has been refactored and optimized.

These changes enhance performance by introducing JIT and parallelization to the outer function. https://github.com/scverse/squidpy/pull/975/commits/cee646ae2d72f8d7f685058c836f7afded835789

The memory access pattern has been adjusted for contiguous memory, and the kernel has been cached. https://github.com/scverse/squidpy/pull/975/commits/52a5fae7e8ae040b1828a2733e25f26858172eea

The runtime has been further reduced by approximately 15%, primarily due to improved memory access patterns.

wenjie1991 avatar Jun 14 '25 19:06 wenjie1991

@wenjie1991 I'll take a look later next week. Thank you for optimising this a bit more.

Intron7 avatar Jun 15 '25 07:06 Intron7

I am writing here to document the current performances here is the results with the current branch. Peakmem and time respectively.

========= ====== ==========
--              layer      
--------- -----------------
dataset   None   off-axis 
========= ====== ==========
  imc     628M     649M   
========= ====== ==========


========= ========= ==========
--               layer        
--------- --------------------
dataset     None    off-axis 
========= ========= ==========
  imc     305±0ms   306±0ms  
========= ========= ==========

And with main:

========= ======= ==========
 --              layer       
 --------- ------------------
  dataset    None   off-axis 
 ========= ======= ==========
    imc     1.65G    1.46G   
 ========= ======= ==========

========= ========= ==========
--               layer        
--------- --------------------
dataset     None    off-axis 
========= ========= ==========
  imc     12.2±0s   12.2±0s  
========= ========= ==========

selmanozleyen avatar Jul 10 '25 15:07 selmanozleyen

@Intron7 I cleaned up the code a bit in this commit https://github.com/selmanozleyen/squidpy/commit/27a6d70eb3c406e570151483f998b8bde191853b. From what I understand using an inplace algorithm wasn't worth it for my benchmark dataset imc at least. My guess would be that because creating a float64 count array and the increment operations would be also costly compared to filling a float array from scratch but I might be misunderstanding something. Here are the results for this commit


 ========= ====== ==========
--              layer      
--------- -----------------
dataset   None   off-axis 
========= ====== ==========
  imc     672M     670M   
========= ====== ==========


========= ========= ==========
--               layer        
--------- --------------------
dataset     None    off-axis 
========= ========= ==========
  imc     287±0ms   294±0ms  
========= ========= ==========

selmanozleyen avatar Jul 11 '25 11:07 selmanozleyen