gtc2vcf icon indicating copy to clipboard operation
gtc2vcf copied to clipboard

Is it possible to resolve multi-nucleotide variants (MNV) to biallelic records?

Open rajwanir opened this issue 4 months ago • 7 comments

Hi @freeseek

I notice that some MNV variants are often assayed using Infinium arrays. These MNV records are represented with a single record in manifest with a single allele but gets translated into multi-allelic in the gtc2vcf. My understanding is that a single probe can only interogate a single allele. May I please seek your help in understanding why and how these MNV records gets translated into multi-allelics?

For example:

CSV manifest record:

IlmnID Name IlmnStrand SNP AddressA_ID AlleleA_ProbeSeq AddressB_ID AlleleB_ProbeSeq GenomeBuild Chr MapInfo Ploidy Species Source SourceVersion SourceStrand SourceSeq TopGenomicSeq BeadSetID Exp_Clusters Intensity_Only RefStrand
5:112838101_MNV-0_B_R_2716756713 5:112838101_MNV BOT [T/C] 37787959 CTCTCCAAACTTCTATCTTTTTCAGAACGAGAACTATCTAAGCTTCCTCT     38 5 112838101 diploid Homo sapiens clinvar 0 BOT TCCGCGTTCTCTCTCCAAACTTCTATCTTTTTCAGAACGAGAACTATCTAAGCTTCCTCT[T/C]NNNAGGAGCTGGGTAACACTGTAGTATTCAAATATGGTGAAAGGACAGTCATGTTGCCAG CTGGCAACATGACTGTCCTTTCACCATATTTGAATACTACAGTGTTACCCAGCTCCTNNN[A/G]AGAGGAAGCTTAGATAGTTCTCGTTCTGAAAAAGATAGAAGTTTGGAGAGAGAACGCGGA 1984 3 0 -

VCF record:

#CHROM POS ID REF ALT QUAL FILTER INFO
chr5 112838101 5:112838101_MNV C A,G . . GC=0.4125;ALLELE_A=1;ALLELE_B=2;FRAC_A=0.310924;FRAC_C=0.193277;FRAC_G=0.235294;FRAC_T=0.260504;NORM_ID=9;BEADSET_ID=1984;ASSAY_TYPE=0

From the chip: ftp://[email protected]/Public_Docs/Genotyping_Array_Support_Files/Global%20Screening%20Array/Global%20Screening%20Array%20v3/GSAv3%2BConfluence/NCI_custom_booster_20032937X371431_A1.csv

The behaviour of Illumina Dragen array gtc-to-vcf is identical (i.e. it also outputs multi-allelic variant in this scenerio).

Thank you.

rajwanir avatar Oct 24 '24 14:10 rajwanir