CrossMap icon indicating copy to clipboard operation
CrossMap copied to clipboard

Wrong ALT when lifting over ALT=T,* like variants

Open maxrossi91 opened this issue 1 year ago • 2 comments

Hi, I noticed that when lifting over variants of the type ALT=T,* those are converted into variants of the type ATL=T,T, independently of the base in the first ALT that in this example is T.

The error can be reproduced when trying to liftover the HG002 CMRG truth for CHM13v1.0 to CHM13v1.1 with the following files:

  • VCF file: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/CMRG_v1.00/CHM13v1.0/SmallVariant/HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz
  • Chain file: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/changes/v1.0_to_v1.1/v1.0_to_v1.1_rdna_merged.chain
  • Reference file: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz

Querying with bcftools -H -r chr19:52194653 the original VCF file the resulting variant (omitting a non-relevant entry):

chr19    52194653    .    TTTTCTTTC    T,*    30    .    .    GT:AD    2|1:0,1,1

Querying with bcftools -H -r chr19:52194656 the lifted VCF you the resulting variant (omitting a non-relevant entry):

chr19    52194656    .    TTTTCTTTC    T,T    30    .    .    GT:AD    2|1:0,1,1

While, the correct liftover should be

chr19    52194656    .    TTTTCTTTC    T,*    30    .    .    GT:AD    2|1:0,1,1

The version of CrossMap that I am using is 0.6.4 downloaded using pip on python version 3.8.12.

maxrossi91 avatar Dec 01 '22 14:12 maxrossi91

This is fixed in v0.6.5

$ python3 ../../CrossMap-0.2.9.git/bin/CrossMap.py vcf v1.0_to_v1.1_rdna_merged.chain HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz chm13v2.0_maskedY.fa.gz out.vcf 2022-12-01 07:16:10 [INFO] Read the chain file "v1.0_to_v1.1_rdna_merged.chain" 2022-12-01 07:16:10 [INFO] Filter out variants [reference_allele == alternative_allele] ... 2022-12-01 07:16:10 [INFO] Creating index for: chm13v2.0_maskedY.fa.gz 2022-12-01 07:16:28 [INFO] Updating contig field ... 2022-12-01 07:16:28 [INFO] Lifting over ... 2022-12-01 07:17:56 [INFO] Total entries: 5966335 2022-12-01 07:17:56 [INFO] Failed to map: 87168

$ awk '$2=="52194656"' out.vcf chr19 52194656 . TTTTCTTTC T,* 30 . . GT:AD 2|1:0,1,1

Please let me know if you have other questions.

liguowang avatar Dec 02 '22 01:12 liguowang

Thank you very much for the rapid fix.

maxrossi91 avatar Dec 02 '22 12:12 maxrossi91