Performance issue converting fast5 -> pod5 with multiple threads
I am running pod5 convert fast5 on a sample with about 5000 fast5 files (from the sample samples, 4000 reads each), writing to a single pod5 per sample.
- When I run it on the 5000 files with -t 1, I get a performance of ~800 reads/s.
- When I run it on the 5000 files with -t 2, the performance goes down dramatically to ~200-300 reads/s, and the jobs keep getting into D state. Increasing threads do not improve the performance, and it sometimes goes down to 50 reads/s
So I made subsets of the reads and compared the performance:
- When I run it on 200 files with -t 1, I get a performance of ~800 reads/s. If I increase the threads, the reads/s keep increasing as expected. As I increased the number of files the gains of multithreading decrease. For my system I found the sweet spot in 300 files.
I thought this could be a bottleneck due to writing to the same file, but If I run two samples in the background simultaneously (thus writing to two different pod5 files) I run into the same situation of decreasing performance (similar to when using multiple threads on loads of files), and the jobs keep getting send to state D. My system should have enough memory to handle the job though.
For now I'm thinking of processing the files in batches and merging the final pod5, but I was curious to know if this is a known issue and what recommendations you have to improve performance when running multiple samples at the same time or with multiple threads.