scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

`normalize_total` with numba

Open Intron7 opened this issue 1 year ago • 3 comments

What kind of feature would you like to request?

Other?

Please describe your wishes

This would speed up normalization

Intron7 avatar Jul 02 '24 12:07 Intron7

Waiting for potential work already done on this by intel folks (@ashish615 have you guys worked on this?)

ilan-gold avatar Aug 08 '24 13:08 ilan-gold

@Intron7 code for below link may work for only csr matrix.
https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/blob/main/pipelines/single-cell-RNA-seq-analysis/notebooks/fastpp.py#L499-L522

ashish615 avatar Aug 08 '24 13:08 ashish615

Implement with or without Intel PR

ilan-gold avatar Oct 17 '24 13:10 ilan-gold

So the implementation from intel would replace the axis_mul_or_truediv function we use as last step here: https://github.com/scverse/scanpy/blob/834159ae1e938a29b3e98b89366c62a40bbe1966/src/scanpy/preprocessing/_normalization.py#L28-L52

Because everything done until that point is to find median in case there is no fixed target sum given. So I would essentially have to utilize numba in axis_mul_or_truediv, does that sound right @ilan-gold @flying-sheep ? If so I would proceed on that

selmanozleyen avatar Apr 07 '25 13:04 selmanozleyen

@selmanozleyen That does sound correct, although I have not looked into it closely before just now, and there is not much of a bread crumb trail to pick at since the Intel code is so different. So I don't want to say "yes for sure" but it does seem that way. It's possible axis_sum could be another place to use numba, I'm really not sure. BTW a good place for something like "fast axis multiplication" or "axis sum" could be https://github.com/scverse/fast-array-utils (and in fact maybe should be there). But prototyping here and getting the actual numba function ready would make sense. Maybe @Intron7 wants to add something as to which part of normalize_total would be "numba-able" if not those mentioned?

ilan-gold avatar Apr 07 '25 13:04 ilan-gold

I removed the axis from the name, since extending the functions to allow axis=None wasn’t that hard, and yes, sum is there.

See here for my thoughts about the scope of fast-array-utils: https://github.com/scverse/scanpy/issues/3449

flying-sheep avatar Apr 08 '25 09:04 flying-sheep