CrossMap
CrossMap copied to clipboard
MAF liftover resulting in same reference and alternate allele
Hello,
Thanks for this useful utility. I have data in MAF format (NCBI build 37). I am trying to lift it over to hg38 using the following Crossmap (v0.6.6) command:
CrossMap maf b37ToHg38.over.chain \\
PAAD_atlas.tmp.maf \\
/work/genomes/Hsapiens/hg38/seq/hg38.fa hg38 \\
/explore/liftover_maf/PAAD_atlas.liftover.maf \\
--chromid l
Liftover file was downloaded from https://github.com/broadinstitute/gatk/blob/083aac832cb64515fd0456008bf847dd22f6c234/scripts/funcotator/data_sources/gnomAD/b37ToHg38.over.chain
The command runs successfully with following output:
2024-02-26 10:29:50 [INFO] Read the chain file "/g/data3/gx8/extras/liftover_chains/b37ToHg38.over.chain"
2024-02-26 10:29:51 [INFO] Lifting over ...
2024-02-26 10:33:58 [INFO] Total entries: 6630811
2024-02-26 10:33:58 [INFO] Failed to map: 1372
However, after inspecting the output I have realised that the Reference
and Tumor_Seq_Allele2
are both the same in the lifted over maf file. For example, the head of output looks like:
$ head PAAD_atlas.liftover.maf
#liftOver: Program=CrossMapv0.6.6, Time=February26,2024, ChainFile=/g/data3/gx8/extras/liftover_chains/b37ToHg38.over.chain, NewRefGenome=/g/data3/gx8/local/development/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa
Hugo_Symbol sample_id Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele2 Tumor_Sample_Barcode HGVSp_Short aa_mutation
1 Avner-primary_tissue_subset FAM231B hg38 chr1 16539492 16539492 Missense_Mutation SNP C C p010_tumor-52fccd-somatic.pcgr.vcf p.R143C NA
2 Avner-primary_tissue_subset ZMYM4 hg38 chr1 35389029 35389029 Nonsense_Mutation SNP G G p010_tumor-52fccd-somatic.pcgr.vcf p.E795* NA
3 Avner-primary_tissue_subset COL8A2 hg38 chr1 36099236 36099236 Nonsense_Mutation SNP G G p010_tumor-52fccd-somatic.pcgr.vcf p.R149* NA
4 Avner-primary_tissue_subset PTGER3 hg38 chr1 70953763 70953763 Missense_Mutation SNP T T p010_tumor-52fccd-somatic.pcgr.vcf p.Q368H NA
5 Avner-primary_tissue_subset C1orf52 hg38 chr1 85259561 85259561 Missense_Mutation SNP C C p010_tumor-52fccd-somatic.pcgr.vcf p.E25K NA
6 Avner-primary_tissue_subset AMY2A hg38 chr1 103617550 103617550 Missense_Mutation SNP T T p010_tumor-52fccd-somatic.pcgr.vcf p.V37D NA
7 Avner-primary_tissue_subset TNR hg38 chr1 175391305 175391305 Missense_Mutation SNP G G p010_tumor-52fccd-somatic.pcgr.vcf p.S497L NA
8 Avner-primary_tissue_subset LAMC2 hg38 chr1 183218424 183218424 Missense_Mutation SNP G G p010_tumor-52fccd-somatic.pcgr.vcf p.A147T NA
In comparison, the head of original (genome build 37) file is:
$ head PAAD_atlas.tmp.maf
Hugo_Symbol sample_id Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele2 Tumor_Sample_Barcode HGVSp_Short aa_mutation
1 Avner-primary_tissue_subset FAM231B 37 1 16865987 16865987 Missense_Mutation SNP C T p010_tumor-52fccd-somatic.pcgr.vcf p.R143C NA
2 Avner-primary_tissue_subset ZMYM4 37 1 35854630 35854630 Nonsense_Mutation SNP G T p010_tumor-52fccd-somatic.pcgr.vcf p.E795* NA
3 Avner-primary_tissue_subset COL8A2 37 1 36564837 36564837 Nonsense_Mutation SNP G A p010_tumor-52fccd-somatic.pcgr.vcf p.R149* NA
4 Avner-primary_tissue_subset PTGER3 37 1 71419446 71419446 Missense_Mutation SNP T G p010_tumor-52fccd-somatic.pcgr.vcf p.Q368H NA
5 Avner-primary_tissue_subset C1orf52 37 1 85725244 85725244 Missense_Mutation SNP C T p010_tumor-52fccd-somatic.pcgr.vcf p.E25K NA
6 Avner-primary_tissue_subset AMY2A 37 1 104160172 104160172 Missense_Mutation SNP T A p010_tumor-52fccd-somatic.pcgr.vcf p.V37D NA
7 Avner-primary_tissue_subset TNR 37 1 175360441 175360441 Missense_Mutation SNP G A p010_tumor-52fccd-somatic.pcgr.vcf p.S497L NA
8 Avner-primary_tissue_subset LAMC2 37 1 183187559 183187559 Missense_Mutation SNP G A p010_tumor-52fccd-somatic.pcgr.vcf p.A147T NA
9 Avner-primary_tissue_subset OBSCN 37 1 228434396 228434396 Missense_Mutation SNP G A p010_tumor-52fccd-somatic.pcgr.vcf p.A1401T NA
It seems the program is updating both reference and alternate alleles. Can you please help me debug the issue? Thanks.