FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

Accelerate single threaded jobs using multiple parallel processes

Open hguturu opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. Currently, FragPipe parallelizes jobs based on if the tool supports multiple threads. But, single threaded jobs like initializing philosopher, annotating the database, filtering and report generation all run one after the other. And some of the jobs like filtering can take ~1min/job and can add up if you have lots of injections to analyze.

Describe the solution you'd like Ideally, FragPipe can also make use of the CPU cores to run multiple single threaded jobs in parallel. Similar to GNU Parallel. e.g. If you have a 12 core machine, either run 12 jobs in parallel each with single thread or one job with 12 threads. The optimal strategy will depend on the algorithm and how well it is parallelized.

Describe alternatives you've considered This may need some consideration and throttling in case parallelizing too many read/write jobs might overload the filesystem.

Additional context I have been running some of the stages like filter manually to accelerate the steps since when I have ~1000 injections, even with a 128 core system, only a single core is used which results in nearly a day of waiting, when these jobs can be run in under an hour when launching many filter jobs in parallel.

hguturu avatar Apr 17 '22 22:04 hguturu

Thanks for your suggestion.

For Philosopher workspace clean, init, and filter commands, we indeed should parallel it. We will implement it in the future.

For Philosopher database annotation, ideally, it only need to run once since all workspaces are using the same database. If we have N experiments, FragPipe will init N+1 workspaces: N in the experimental directories, and 1 in the outermost directory. Philosopher just need to annotate the database in the workspace in the outermost directory, and let others to use it. Felipe @prvst , can you implement this idea in Philosopher? I can change FragPipe after it is ready.

Best,

Fengchao

fcyu avatar Apr 17 '22 23:04 fcyu

The way how I solved this within the philosopher pipeline was by running the annotation once, and then copying the db.bin file to the other data set meta folders. We can add something similar to FragPipe.

prvst avatar Apr 18 '22 14:04 prvst

Yes. I think we added the same logic for the razor.bin. We can do it again for the db.bin.

Best,

Fengchao

fcyu avatar Apr 18 '22 14:04 fcyu