CrossMap
CrossMap copied to clipboard
Wrong ALT when lifting over ALT=T,* like variants
Hi, I noticed that when lifting over variants of the type ALT=T,*
those are converted into variants of the type ATL=T,T
, independently of the base in the first ALT that in this example is T
.
The error can be reproduced when trying to liftover the HG002 CMRG truth for CHM13v1.0 to CHM13v1.1 with the following files:
- VCF file:
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/CMRG_v1.00/CHM13v1.0/SmallVariant/HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz
- Chain file:
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/changes/v1.0_to_v1.1/v1.0_to_v1.1_rdna_merged.chain
- Reference file:
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz
Querying with bcftools -H -r chr19:52194653
the original VCF file the resulting variant (omitting a non-relevant entry):
chr19 52194653 . TTTTCTTTC T,* 30 . . GT:AD 2|1:0,1,1
Querying with bcftools -H -r chr19:52194656
the lifted VCF you the resulting variant (omitting a non-relevant entry):
chr19 52194656 . TTTTCTTTC T,T 30 . . GT:AD 2|1:0,1,1
While, the correct liftover should be
chr19 52194656 . TTTTCTTTC T,* 30 . . GT:AD 2|1:0,1,1
The version of CrossMap that I am using is 0.6.4
downloaded using pip
on python version 3.8.12
.
This is fixed in v0.6.5
$ python3 ../../CrossMap-0.2.9.git/bin/CrossMap.py vcf v1.0_to_v1.1_rdna_merged.chain HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz chm13v2.0_maskedY.fa.gz out.vcf 2022-12-01 07:16:10 [INFO] Read the chain file "v1.0_to_v1.1_rdna_merged.chain" 2022-12-01 07:16:10 [INFO] Filter out variants [reference_allele == alternative_allele] ... 2022-12-01 07:16:10 [INFO] Creating index for: chm13v2.0_maskedY.fa.gz 2022-12-01 07:16:28 [INFO] Updating contig field ... 2022-12-01 07:16:28 [INFO] Lifting over ... 2022-12-01 07:17:56 [INFO] Total entries: 5966335 2022-12-01 07:17:56 [INFO] Failed to map: 87168
$ awk '$2=="52194656"' out.vcf chr19 52194656 . TTTTCTTTC T,* 30 . . GT:AD 2|1:0,1,1
Please let me know if you have other questions.
Thank you very much for the rapid fix.