cDNA_Cupcake
cDNA_Cupcake copied to clipboard
Parallel processing in subsample.py
This PR adds parallelization to subsampling as this script takes too long to run right now.
I tested new script with 10,000 total reads at 100 reads step size and 100 iterations:
With original script:
35.3 s ± 70.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With parallel script (5 threads):
12.8 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The improvement should be more pronounced in real samples as multiprocessing overhead becomes negligible.