KeyError('unknown reference MT') on GRCh38.d1.vd1(UCSC-style-name)
Hi!
When running aldy==4.8.2 on a BAM aligned to GRCh38.d1.vd1 (UCSC-style-name), the program crashes with:
Gene MT-RNR1
ERROR: gene= all, file= /inputs/xxxxxx_dnaseq-alignment_v1.5.0_bwa2.bam
KeyError('unknown reference MT')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/aldy/__main__.py", line 126, in main
_genotype(args.gene, output, args)
File "/usr/local/lib/python3.10/dist-packages/aldy/__main__.py", line 447, in _genotype
run(None)
File "/usr/local/lib/python3.10/dist-packages/aldy/__main__.py", line 397, in run
_ = genotype(
File "/usr/local/lib/python3.10/dist-packages/aldy/genotype.py", line 120, in genotype
**genotype(
File "/usr/local/lib/python3.10/dist-packages/aldy/genotype.py", line 180, in genotype
sample = sam.Sample(gene, profile, sam_path, reference, debug)
File "/usr/local/lib/python3.10/dist-packages/aldy/sam.py", line 117, in __init__
norm, muts = self._load_sam(path, reference, debug)
File "/usr/local/lib/python3.10/dist-packages/aldy/sam.py", line 171, in _load_sam
self._realign_indels(tmp, sam, reference)
File "/usr/local/lib/python3.10/dist-packages/aldy/sam.py", line 463, in _realign_indels
sz = sam.get_reference_length(rname)
File "pysam/libcalignmentfile.pyx", line 1916, in pysam.libcalignmentfile.AlignmentFile.get_reference_length
File "pysam/libcalignmentfile.pyx", line 511, in pysam.libcalignmentfile.AlignmentHeader.get_reference_length
KeyError: 'unknown reference MT'
Cause: Aldy expects MT or chrMT, GRCh38.d1.vd1 (UCSC-style) uses chrM.
Possibly, the chr_prefix function is the cause. It correctly adds the chr prefix for chromosomes (1,2,…,X,Y), but it does not handle the special case of the mitochondrial chromosome: in hg19 it is usually called MT, while in GRCh38.d1.vd1 it is chrM. When chr is added to MT, it becomes chrMT, which does not exist in the BAM file, leading to the error.
It seems that Aldy’s test BAM files don't contain the mitochondrial chromosome at all, so this problem doesn't show up in your tests. For example:
samtools view -h /inputs/HG03166.pb.bam | grep "@SQ"
@SQ SN:chr1 LN:248956422
...
@SQ SN:chr21 LN:46709983
@SQ SN:chr22 LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
Could you please add support for chrM? That would make Aldy work smoothly with GRCh38.d1.vd1 (UCSC-style) -aligned BAMs
Thanks a lot for your work on this project!
Hi @Anastasiabum ,
Thank you for the bug report!
Can you please try #100 and see if that fixes the problem? If so, I will merge this.
Hello @inumanag, we are expiriencing same problem and your fix is not working for us.
According to my understanding of code you are checking for prefix for each gene separately, not globally: https://github.com/0xTCG/aldy/blob/fix-chrm/aldy/sam.py#L562-L566
self._prefix = chr_prefix(
self.gene.chr, [x["SN"] for x in sam.header["SQ"]]
)
if self.gene.chr == "MT" and self._prefix == "chr":
self.gene.chr = "M" # UCSC fix
And we are checking by adding prefix chr to chrs list:
https://github.com/0xTCG/aldy/blob/fix-chrm/aldy/common.py#L236-L243
def chr_prefix(ch: str, chrs: List[str]) -> str:
"""
Check if a chromosome needs "chr" prefix given the available chromosomes.
:returns: Chromosome prefix if the chromosome does not have it.
"""
if ch not in chrs and "chr" + ch in chrs:
return "chr"
return ""
But if ch is MT - there are no chrMT in BAM-header and prefix chr is empty for MT - so your fix is not changing self.gene.chr value.
Dirty way to fix is to add chr prefix if chrM is present and after change self.gene.chr to M as you implemented. But I think it will be better to remove prefix logic and return correct chr name instead (I understand that it will need some time)
@Stikus MT without the prefix should work (but to does not :) ).
Could you please send me a SAM header with a single read from chrM[T] so that I can do a quick test? Thanks!
SAM/BAM header for chrM:
@SQ SN:chrM LN:16569 AS:GRCh38.d1.vd1 SP:Homo sapiens
Read from chrM:
A00921:12:HNF33DSXX:1:1103:15275:28635 163 chrM 1 60 20S131M = 174 321 TTAAATAAGACATCACGATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:136M3I12M MD:Z:131 RG:Z:10_1_S19_L001_R NM:i:0 AS:i:131 XS:i:86
Dear @inumanag, any news?