paragraph icon indicating copy to clipboard operation
paragraph copied to clipboard

Error with idxdepth: "Assertion failed: _impl->header_contig_map.count(chr) != 0"

Open cassisnaro opened this issue 3 years ago • 1 comments

hi,

I have a BAM file downloaded from the GDC and the corresponding reference fasta (GRCh38.d1.vd1.fa). I have the following assertion which is triggered: Assertion failed: _impl->header_contig_map.count(chr) != 0 After adding log messages, at least one case where it fails (I am not sure how multithreading might change results here) is with the sequence 'HCV-1'. I get the log line Thread 140029280118528 estimating depth for HCV-1 but if I add a warning in BamReader::estimateDepth I see that what is being worked on is 'HCV'. I believe that the problem resides in the separators used in StringUtil::parsePos at line 159. I should note that HCV-1 and HCV-2 were added to reference_chromosomes map in estimateDepths and to both_chromosomes in the same function.

I do not know what the correct path for correction is, as I expect you had your reasons to include '-' as a separator. I expect that if I modify the fasta and prepare a new header for my Bam in order to substitute the '-' with '_' this would correct the situation. Or would you plan an update to address this?

If I may also take the opportunity to hint at the warning: BAM header only has a subset of the reference chromosomes -- please make sure they match!. With my input, many contigs are not included in the both_chromosomes map. For example, among others, chr1_KI270706v1_random, chr1_KI270707v1_random, HTLV-1, or HPV100. Clearly, disregarding is on purpose by testing include_alt_contig, but then the warning is misleading. (And I am not sure if later on, missing information in the both_chromosomes, might trigger more errors).

Regards,

cassisnaro avatar May 04 '21 09:05 cassisnaro

Thanks for pointing this out! You're right this is because '-' is a separator. See the function at StringUtil.hh line 156. To make your pipeline more robust, I would recommend using underscore in chromosome names instead of dash. I am not if other tools have similar issues...

traxexx avatar May 05 '21 05:05 traxexx