CrossMap icon indicating copy to clipboard operation
CrossMap copied to clipboard

MAF liftover resulting in same reference and alternate allele

Open skanwal opened this issue 4 months ago • 7 comments

Hello,

Thanks for this useful utility. I have data in MAF format (NCBI build 37). I am trying to lift it over to hg38 using the following Crossmap (v0.6.6) command:

CrossMap maf b37ToHg38.over.chain \\ 
PAAD_atlas.tmp.maf \\
/work/genomes/Hsapiens/hg38/seq/hg38.fa hg38 \\
/explore/liftover_maf/PAAD_atlas.liftover.maf \\
--chromid l

Liftover file was downloaded from https://github.com/broadinstitute/gatk/blob/083aac832cb64515fd0456008bf847dd22f6c234/scripts/funcotator/data_sources/gnomAD/b37ToHg38.over.chain

The command runs successfully with following output:

2024-02-26 10:29:50 [INFO]  Read the chain file "/g/data3/gx8/extras/liftover_chains/b37ToHg38.over.chain"
2024-02-26 10:29:51 [INFO]  Lifting over ...
2024-02-26 10:33:58 [INFO]  Total entries: 6630811
2024-02-26 10:33:58 [INFO]  Failed to map: 1372

However, after inspecting the output I have realised that the Reference and Tumor_Seq_Allele2 are both the same in the lifted over maf file. For example, the head of output looks like:

$ head PAAD_atlas.liftover.maf
#liftOver: Program=CrossMapv0.6.6, Time=February26,2024, ChainFile=/g/data3/gx8/extras/liftover_chains/b37ToHg38.over.chain, NewRefGenome=/g/data3/gx8/local/development/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa
Hugo_Symbol	sample_id	Hugo_Symbol	NCBI_Build	Chromosome	Start_Position	End_Position	Variant_Classification	Variant_Type	Reference_Allele	Tumor_Seq_Allele2	Tumor_Sample_Barcode	HGVSp_Short	aa_mutation
1	Avner-primary_tissue_subset	FAM231B	hg38	chr1	16539492	16539492	Missense_Mutation	SNP	C	C	p010_tumor-52fccd-somatic.pcgr.vcf	p.R143C	NA
2	Avner-primary_tissue_subset	ZMYM4	hg38	chr1	35389029	35389029	Nonsense_Mutation	SNP	G	G	p010_tumor-52fccd-somatic.pcgr.vcf	p.E795*	NA
3	Avner-primary_tissue_subset	COL8A2	hg38	chr1	36099236	36099236	Nonsense_Mutation	SNP	G	G	p010_tumor-52fccd-somatic.pcgr.vcf	p.R149*	NA
4	Avner-primary_tissue_subset	PTGER3	hg38	chr1	70953763	70953763	Missense_Mutation	SNP	T	T	p010_tumor-52fccd-somatic.pcgr.vcf	p.Q368H	NA
5	Avner-primary_tissue_subset	C1orf52	hg38	chr1	85259561	85259561	Missense_Mutation	SNP	C	C	p010_tumor-52fccd-somatic.pcgr.vcf	p.E25K	NA
6	Avner-primary_tissue_subset	AMY2A	hg38	chr1	103617550	103617550	Missense_Mutation	SNP	T	T	p010_tumor-52fccd-somatic.pcgr.vcf	p.V37D	NA
7	Avner-primary_tissue_subset	TNR	hg38	chr1	175391305	175391305	Missense_Mutation	SNP	G	G	p010_tumor-52fccd-somatic.pcgr.vcf	p.S497L	NA
8	Avner-primary_tissue_subset	LAMC2	hg38	chr1	183218424	183218424	Missense_Mutation	SNP	G	G	p010_tumor-52fccd-somatic.pcgr.vcf	p.A147T	NA

In comparison, the head of original (genome build 37) file is:

$ head PAAD_atlas.tmp.maf
Hugo_Symbol	sample_id	Hugo_Symbol	NCBI_Build	Chromosome	Start_Position	End_Position	Variant_Classification	Variant_Type	Reference_Allele	Tumor_Seq_Allele2	Tumor_Sample_Barcode	HGVSp_Short	aa_mutation
1	Avner-primary_tissue_subset	FAM231B	37	1	16865987	16865987	Missense_Mutation	SNP	C	T	p010_tumor-52fccd-somatic.pcgr.vcf	p.R143C	NA
2	Avner-primary_tissue_subset	ZMYM4	37	1	35854630	35854630	Nonsense_Mutation	SNP	G	T	p010_tumor-52fccd-somatic.pcgr.vcf	p.E795*	NA
3	Avner-primary_tissue_subset	COL8A2	37	1	36564837	36564837	Nonsense_Mutation	SNP	G	A	p010_tumor-52fccd-somatic.pcgr.vcf	p.R149*	NA
4	Avner-primary_tissue_subset	PTGER3	37	1	71419446	71419446	Missense_Mutation	SNP	T	G	p010_tumor-52fccd-somatic.pcgr.vcf	p.Q368H	NA
5	Avner-primary_tissue_subset	C1orf52	37	1	85725244	85725244	Missense_Mutation	SNP	C	T	p010_tumor-52fccd-somatic.pcgr.vcf	p.E25K	NA
6	Avner-primary_tissue_subset	AMY2A	37	1	104160172	104160172	Missense_Mutation	SNP	T	A	p010_tumor-52fccd-somatic.pcgr.vcf	p.V37D	NA
7	Avner-primary_tissue_subset	TNR	37	1	175360441	175360441	Missense_Mutation	SNP	G	A	p010_tumor-52fccd-somatic.pcgr.vcf	p.S497L	NA
8	Avner-primary_tissue_subset	LAMC2	37	1	183187559	183187559	Missense_Mutation	SNP	G	A	p010_tumor-52fccd-somatic.pcgr.vcf	p.A147T	NA
9	Avner-primary_tissue_subset	OBSCN	37	1	228434396	228434396	Missense_Mutation	SNP	G	A	p010_tumor-52fccd-somatic.pcgr.vcf	p.A1401T	NA

It seems the program is updating both reference and alternate alleles. Can you please help me debug the issue? Thanks.

skanwal avatar Feb 26 '24 00:02 skanwal