modkit icon indicating copy to clipboard operation
modkit copied to clipboard

CRAM file as an input to modkit

Open reshu23 opened this issue 5 months ago • 6 comments

Dear Team, Can I use "haplotagged.cram" (output of wf human variation pipeline) as an input to "modkit entropy" ?

reshu23 avatar Jul 15 '25 08:07 reshu23

Hello @reshu23,

CRAM format should work for pileup but is not currently supported in entropy. I can get to adding this capability.

ArtRand avatar Jul 17 '25 04:07 ArtRand

Thank you. It is really important to add cram support for function like entropy or extract call/full. One strange thing, I was playing with "modkit entropy" by using cram file as an input. I can see, it is running, althought it has been 13 hours, still it is running, here is info from log file:

Image

reshu23 avatar Jul 17 '25 07:07 reshu23

We have seen at least two different modes of intermittent failure for modkit pileup when using cram as input. In about 4/80 cases, it just segfaulted. In a number of other cases, it threw htslib errors for a small but significant fraction of reads (about 0.5% per modkit modbam check-tags). These errors disappear when the crams are converted back to bams and fed into modkit as input.

It would be greatly beneficial to have better support for cram input in modkit. We generally do not keep bam around for very long, and it adds significant overhead to large projects if we have to uncram before running modkit.

oneillkza avatar Sep 25 '25 18:09 oneillkza

Hello @reshu23 sorry I missed this comment from earlier. I haven't tested entropy with CRAM input, so I can't speak to the performance regression you're observing. Is there a way you could isolate the records that Modkit is complaining are truncated?

ArtRand avatar Sep 25 '25 21:09 ArtRand

Hello @oneillkza,

Understood. I'm actively working on some performance improvements and improving CRAM handling is on the work list. Would it be possible for you to share a few CRAM files that cause segfaults or htslib errors with me? Please send me email at art.rand[at]nanoporetech.com

ArtRand avatar Sep 25 '25 21:09 ArtRand

Thanks @ArtRand, that's good to know!

Unfortunately these are clinical research samples that I cannot share, but let me see if I can reproduce this on some cell line data.

oneillkza avatar Sep 27 '25 03:09 oneillkza