CRAM file as an input to modkit
Dear Team, Can I use "haplotagged.cram" (output of wf human variation pipeline) as an input to "modkit entropy" ?
Hello @reshu23,
CRAM format should work for pileup but is not currently supported in entropy. I can get to adding this capability.
Thank you. It is really important to add cram support for function like entropy or extract call/full. One strange thing, I was playing with "modkit entropy" by using cram file as an input. I can see, it is running, althought it has been 13 hours, still it is running, here is info from log file:
We have seen at least two different modes of intermittent failure for modkit pileup when using cram as input. In about 4/80 cases, it just segfaulted. In a number of other cases, it threw htslib errors for a small but significant fraction of reads (about 0.5% per modkit modbam check-tags). These errors disappear when the crams are converted back to bams and fed into modkit as input.
It would be greatly beneficial to have better support for cram input in modkit. We generally do not keep bam around for very long, and it adds significant overhead to large projects if we have to uncram before running modkit.
Hello @reshu23 sorry I missed this comment from earlier. I haven't tested entropy with CRAM input, so I can't speak to the performance regression you're observing. Is there a way you could isolate the records that Modkit is complaining are truncated?
Hello @oneillkza,
Understood. I'm actively working on some performance improvements and improving CRAM handling is on the work list. Would it be possible for you to share a few CRAM files that cause segfaults or htslib errors with me? Please send me email at art.rand[at]nanoporetech.com
Thanks @ArtRand, that's good to know!
Unfortunately these are clinical research samples that I cannot share, but let me see if I can reproduce this on some cell line data.