failed to read record from bam information, truncated record in SAM/BAM/CRAM file
Hello,
I am trying to use the modkit extract full command on my data, but i only get records failed during processing:
(sturgeon) p2@p2-Precision-5860-Tower:~/tools$ modkit extract full haplotagged.cram /output/HM.txt --log-filepath log.txt
> found BAM index, processing reads in 100000 base pair chunks
> 2952 ~records failed
> 0 ~records skipped
> 0 ~records used
> 0 rows written
|--------------------------------------- 1/10000 enqueued processed reads
[00:03:17] ####------------------------------------ 294656422/3299210039 genome positions ^C
And the log tells me:
[src/logging.rs::60][2025-02-17 12:52:47][DEBUG] command line: modkit_v0.4.3_u16_x86_64/dist_modkit_v0.4.3_d13b97d/modkit extract full /home/p2/epi2melabs/instances/wf-human-variation_01JKAX74XSDEBXWFFN086R6HTH/output/19H01535.haplotagged.cram /home/p2/epi2melabs/instances/wf-human-variation_01JKAX74XSDEBXWFFN086R6HTH/output/19H01535HM.txt --log-filepath /home/p2/epi2melabs/instances/wf-human-variation_01JKAX74XSDEBXWFFN086R6HTH/log.txt
[src/extract/util.rs::281][2025-02-17 12:52:47][INFO] found BAM index, processing reads in 100000 base pair chunks
[src/interval_chunks.rs::512][2025-02-17 12:52:47][DEBUG] there are 711 contig(s) to work on (711 parts)
[src/mod_bam.rs::112][2025-02-17 12:52:48][DEBUG] failed to read record from bam information, truncated record in SAM/BAM/CRAM file
[src/mod_bam.rs::112][2025-02-17 12:52:48][DEBUG] failed to read record from bam information, truncated record in SAM/BAM/CRAM file
...
Do you have any idea on what could cause this issue? Could it be that it needs a BAM?
Many thanks in advance!
Hello @ziphra,
Could you run modkit modbam check-tags docs here on the CRAM or a small subset of it and post the output here? It will tabulate the errors encountered whilst parsing the records. (You need 0.4.3 for this function).
CRAM should work fine, you can test the command using the CRAM file in the testing resources directory. Which program is generating the modBAM files? If it's still not clear what's going on, maybe you can attach a sampling of the file and I can take a look.
Hello, thank you for your reply.
modkit modbam check-tags gave me this:
modkit modbam check-tags '/home/p2/epi2melabs/instances/wf-human-variation_01JKAX74XSDEBXWFFN086R6HTH/output/19H01535.haplotagged.cram'
[00:00:14] ######################################## 3299210039/3299210039 genome positions > input modBAM contains 1314 (1.04%) failed records
> num PASS records: 126360 (100.00%)
> num records: 126360
> errors:
+----------------------------------------------------+-------+--------+
| error | count | pct |
+----------------------------------------------------+-------+--------+
| HtsLib-error-truncated record in SAM/BAM/CRAM file | 1314 | 100.00 |
| total | 1314 | 100 |
+----------------------------------------------------+-------+--------+
> valid record tag headers:
+------------+--------+
| tag_header | count |
+------------+--------+
| C+h? | 117320 |
| C+m? | 117320 |
+------------+--------+
> modified bases:
+--------+--------------+----------+------+
| strand | primary_base | mod_code | mode |
+--------+--------------+----------+------+
| + | C | h | ? |
| + | C | m | ? |
+--------+--------------+----------+------+
> Error! input modBAM contains 1314 (1.04%) failed records
My cram was generated with Epi2me, so it's the usual workflow.
It worked with the bam files generated during sequencing.
@ziphra
It looks like these are different files you're using? haplotagged.cram vs. 19H01535.haplotagged.cram? The first file looks to have almost entirely failed records, but the second one it is a small sub-population.
As you've seen in the first post, most modkit extract will simply drop failed records. Of course if they all fail, it's not much help. I can check with the Epi2Me team about how truncated records could be written.
Aside:
> num PASS records: 126360 (100.00%)
Looks to me like this counter is broken, so I'll fix that.
No, it's the same file, I changed names and removed path when posting for readability. I just tried the two commands again and got the same results. I wonder if this could be due to my reads being phased?
Hello @ziphra,
Ok. I don't think that having phased reads should matter, the phasing is just an tag in the record (same as the modified base probabilities). Would it be possible fo you to attach a small sample CRAM that exposes the issue? You can also email/share it with me at art.rand[at]nanoporetech.com.
I'm encountering an unexpected issue with this CRAM file. I try subsetting a region using the following commands, so I could share it with you:
samtools view -T hg38.fa --output-fmt cram -h -o chr7.cram' haplotagged.cram chr7
samtools index chr7.cram
Then, when I run modkit extract full on the subsetted CRAM, it works!
I'm not sure why this behaviour occurs. Any insights would be appreciated.
Hello @ziphra,
If you randomly subsample the CRAM (samtools view --output-fmt cram -h --subsample FRAC) does it work? If not, could you send me that one?
Hi, It also worked after subsampling.
I tried a couple of things, and it did work too after recompressing the original CRAM with samtools view -T hg38.fa -C -o new_haplotagged.cram haplotagged.cram.
For some reasons, the CRAM might be corrupted, even though samtools quickcheck -v haplotagged.cram returned no errors. There might be a problem with the EPi2me pipeline.
Thank you very much for your help !
One thing, is that my new compressed CRAM is 116GB when it was originally 32... Maybe some aggressive parameters to achieve higher compression corrupted the CRAM.
Hello @ziphra,
In my experience piping an HTS-file through samtools as you've done has the effect of cleaning it up - so I guess I'm not too surprised. I'm not sure how I can help troubleshoot the corrupted CRAM file, if you can reproduce the problem I would bring it to the attention of the Epi2Me team via their github.