scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

Possible enhancement: multithreaded (via numba) mann-whitney tests

Open jamestwebber opened this issue 4 years ago • 3 comments

  • [x] Additional function parameters / changed functionality / changed defaults?

I recently wrote up a parallelized implementation of the Mann-Whitney U test, for my own use (gist is here). For the types of tests we tend to do in scRNAseq (lots of different features, 2d arrays) it basically scales with the number of cores you can throw at it. When you're doing a lot of tests this is very nice!

Given that scanpy already has a dependency on numba this would be a pretty simple thing to add, if you want to do so. Thought I would just point it out!

  • James

jamestwebber avatar Nov 26 '21 23:11 jamestwebber

We're always up for improved performance! Would love to see improvements here. (Btw, I think I've already got your gist bookmarked on twitter)

Do you have any benchmarks of performance here? Especially against our current implementation.

ivirshup avatar Nov 29 '21 14:11 ivirshup

I haven't benchmarked against scanpy, only against scipy.stats.mannwhitneyu (which at this point can handle arrays, I know it couldn't before). On my laptop (an 8-core Intel MacBook Pro) it's about a 10x speedup. But with more cores it can be a lot more.

Even without parallelization, you can get some improvement by just using numba.njit on some of the internal bits (e.g. tiecorrect).

Of course, your code has a lot of options that I didn't bother with, because I didn't need them. Some of them might be harder to JIT than others.

jamestwebber avatar Nov 29 '21 15:11 jamestwebber

Your changes made it into rank_genes_groupswilcoxon flavor via #3529.

Scanpy doesn’t currently have mannwhitneyu, but if you want to contribute it, feel free!

flying-sheep avatar Mar 28 '25 15:03 flying-sheep