cDNA_Cupcake Parallel processing in subsample.py

Parallel processing in subsample.py

Open SichongP opened this issue 3 years ago • 0 comments

This PR adds parallelization to subsampling as this script takes too long to run right now.

I tested new script with 10,000 total reads at 100 reads step size and 100 iterations:

With original script:

35.3 s ± 70.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With parallel script (5 threads):

12.8 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The improvement should be more pronounced in real samples as multiprocessing overhead becomes negligible.

Feb 24 '22 14:02 SichongP