Modkit pileup issue with --include-bed
Hello all, we are trying to run modkit on an adaptive sampling run, and just started having issues with our normal pipeline after the most recent upodates to minknow.
Generally, we use samtools cat -> sambamba sort -> samtools index followed by running modkit with the --include-bed parameter to call all context 5mC methylation on the outoput. This always worked, up until recently, where we found that when the bed file includes over a couple MB, it is hanging up and stuck running in a loop in certain regions of the genome when running with the --include-bed parameter.
The error it is giving is:
write(2, "fetching sequence failed, ", 26) = 26 write(2, "FASTA read interval was out of bounds", 37) = 37 write(2, "\n", 1) = 1 write(2, ">", 1) = 1 write(2, " ", 1) = 1
It gets to a certain region of the genome, and then loops this error multiple times per second, and hangs up unable to proceed.
Interestingly, if we use the exact same bed file to subset our bam within samtools view, followed by running modkit pileup, everything works fine!
The command we are running is: modkit pileup test1.sorted.bam barcode09_merged.sorted.methylbed --motif CG 0 --motif CHG 0 --motif CHH 0 --include-bed dbtest.bed --ref Zm-B73-REFERENCE-NAM-5.0.fa --filter-threshold 0.66 --with-header --ignore h
and the bed file is attached (but as a txt because github didnt like me uploading a bed).
We have been running this pipeline for a while with no issues, so not sure if it is related to a change in the bam files produced by minknow, or what.
Thanks
@jcolicchio-soundag,
I don't know how or why increasing the size of the --include-bed file should make a difference. None of this code has changed recently, but could you tell me which version of Modkit you're using? If it's possible could you email me art.rand[at]nanoporetech.com so we could arrange a way to reproduce the error? I can try some tests on my side in the mean time.