seurat icon indicating copy to clipboard operation
seurat copied to clipboard

Parallelization for PrepSCTFindMarkers

Open rbutleriii opened this issue 2 years ago • 2 comments

Hi all,

Looking through PrepSCTFindMarkers, it doesn't look like it is using future to speed up its lapply calls, and it is notably slow for me. I think for ~136k cells x 50k features it takes about 4-6ish hours to complete? Small thing, but currently this is the longest step in my pipelines.

Thanks

rbutleriii avatar Jun 25 '22 22:06 rbutleriii

Thanks for the request - it can definitely be parallelized. I will look into it.

saketkc avatar Jun 26 '22 15:06 saketkc

I am crying. I have got 98 datasets and 2M cells to run. I am desperate now.

realzehuali avatar Sep 05 '22 07:09 realzehuali

Thanks everyone for your patience. https://github.com/satijalab/seurat/commit/e5171f7a6b2b1145753b0b1606d16e30d6b0984d should speed up PrepSCTFindMarkers by default (and also supports parallelization - though that might not always lead to speedups given the data splitting being a bottle neck).

For V4, you can test the changes by installing the latest develop branch:

devtools::install_github("satijalab/seurat", ref="develop")

saketkc avatar Jul 14 '23 19:07 saketkc