paragraph
paragraph copied to clipboard
Error with idxdepth: "Assertion failed: _impl->header_contig_map.count(chr) != 0"
hi,
I have a BAM file downloaded from the GDC and the corresponding reference fasta (GRCh38.d1.vd1.fa). I have the following assertion which is triggered:
Assertion failed: _impl->header_contig_map.count(chr) != 0
After adding log messages, at least one case where it fails (I am not sure how multithreading might change results here) is with the sequence 'HCV-1'. I get the log line
Thread 140029280118528 estimating depth for HCV-1
but if I add a warning in BamReader::estimateDepth
I see that what is being worked on is 'HCV'. I believe that the problem resides in the separators used in StringUtil::parsePos
at line 159.
I should note that HCV-1 and HCV-2 were added to reference_chromosomes
map in estimateDepths and to both_chromosomes
in the same function.
I do not know what the correct path for correction is, as I expect you had your reasons to include '-' as a separator. I expect that if I modify the fasta and prepare a new header for my Bam in order to substitute the '-' with '_' this would correct the situation. Or would you plan an update to address this?
If I may also take the opportunity to hint at the warning: BAM header only has a subset of the reference chromosomes -- please make sure they match!
. With my input, many contigs are not included in the both_chromosomes
map. For example, among others, chr1_KI270706v1_random
, chr1_KI270707v1_random
, HTLV-1
, or HPV100
. Clearly, disregarding is on purpose by testing include_alt_contig
, but then the warning is misleading. (And I am not sure if later on, missing information in the both_chromosomes, might trigger more errors).
Regards,
Thanks for pointing this out! You're right this is because '-' is a separator. See the function at StringUtil.hh line 156. To make your pipeline more robust, I would recommend using underscore in chromosome names instead of dash. I am not if other tools have similar issues...