Preprocess annotate_raw_with_fastqs halting issue
Hiya!
I'm finding that tombo preprocess halts around halfway through any run. The first half runs relatively quick (predicted 2 hour run time) but this appears to slow down exponentially until it halts completely.
The command I am running is:
tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5s_single/ --fastq-filenames fast5s_guppy.fastq --sequencing-summary-filenames fast5s_guppy/sequencing_summary.txt --processes 32 --basecall-group Basecall_1D_000 --basecall-subgroup BaseCalled_template --overwrite
Here is a run after approx 18 hours:
[14:34:36] Getting read filenames.
[14:36:17] Parsing sequencing summary files.
[14:36:35] Annotating FAST5s with sequence from FASTQs.
35%|████████████████████████████████████████████▌ | 2644500/7596528 [2:39:39<4:58:58, 276.05it/s]
This is the output if I terminate the process:
^CTraceback (most recent call last):
Process Process-34:
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/bin/tombo", line 11, in <module>
Process Process-3:
Process Process-22:
Process Process-31:
Process Process-24:
Process Process-15:
Process Process-17:
Process Process-21:
Process Process-29:
Process Process-30:
Process Process-27:
Process Process-9:
Process Process-26:
Process Process-5:
Process Process-23:
Process Process-20:
Process Process-25:
Process Process-11:
Process Process-33:
Process Process-32:
Process Process-13:
Process Process-8:
Process Process-7:
Process Process-2:
Process Process-19:
Process Process-6:
Process Process-18:
Process Process-4:
sys.exit(main())
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/site-packages/tombo/__main__.py", line 235, in main
_preprocess.annotate_reads_with_fastq_main(args)
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/site-packages/tombo/_preprocess.py", line 526, in annotate_reads_with_fastq_main
args.overwrite)
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/site-packages/tombo/_preprocess.py", line 283, in _annotate_with_fastqs
fq_feed_p.join()
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
What I've tried
- This issue repeats itself on the pip and conda installations of 1.5.1. I've tried reinstalling but no success.
- This issue doesn't repeat itself on a smaller run (using dataset here: https://github.com/PengNi/deepsignal-plant).
- I've checked that sequencing_summary.txt has the same number of reads as there are fast5s in the directory.
Other info
- htop shows that a multithreaded run eventually turns into a single process using ~20% CPU.
- These reads are from a size selected ONT run, N50 is approx 30kbp with some reads >100kbp.
- This is running on a Rocky Linux system with a SLURM manager. Issue occurs with interactive and batch jobs.
Any support would be greatly appreciated, I've checked through the previous issues but couldn't find a solution!
Thanks in advance :)