gtc2vcf icon indicating copy to clipboard operation
gtc2vcf copied to clipboard

Drop marker if SourceSeq maps to different loci equally well.

Open rajwanir opened this issue 6 months ago • 3 comments

I see that typically gtc2vcf does a fantastic job in infering the coordinates with the SourceSeq mapping workflow. However, I note that in some cases when SourceSeq maps to different loci equally well, one of the mapping gets picked up by gtc2vcf instead of dropping the marker.

Here is an example for rs10435524 which maps to chr8:30404042. The SourceSeq is below:

TTCACTGGATACAAAAATGCTTAGCAGATAATTTCTGGGGGTGTCATTCTTGAGTTTATC[A/G]GACACCGTGAAGTGTGTTGCTTTTTGTGTGTTAGGTGCTTGCTATATTTTTCTGGCTATT

The bwa alignment of the SourcSeq looks like this:

rs10435524-138_B_R_2276283903:1 0 chr8 30403982 0 121M *0 0 TTCACTGGATACAAAAATGCTTAGCAGATAATTTCTGGGGGTGTCATTCTTGAGTTTATCAGACACCGTGAAGTGTGTTGCTTTTTGTGTGTTAGGTGCTTGCTATATTTTTCTGGCTATT * NM:i:1MD:Z:60G60 AS:i:116 XS:i:116 XA:Z:chrX,+37095221,121M,1; rs10435524-138_B_R_2276283903:2 0 chrX 37095221 0 121M *0 0 TTCACTGGATACAAAAATGCTTAGCAGATAATTTCTGGGGGTGTCATTCTTGAGTTTATCGGACACCGTGAAGTGTGTTGCTTTTTGTGTGTTAGGTGCTTGCTATATTTTTCTGGCTATT * NM:i:0MD:Z:121 AS:i:121 XS:i:121 XA:Z:chr8,+30403982,121M,0;

The gtc2vcf outputs the mapping as chrX:37095281. The SourceSeq in this case seems not uniquely mapping to the reference and I think can be filtered out. I see that something along these lines is already implemented in gtc2vcf, however, I still see markers like this slip through. Do you know why?

I have additional examples like this. Overall a small percentage of all markers of course but still appears inaccurate and wanted to see if we can fix it.

Thanks.

rajwanir avatar Aug 07 '24 20:08 rajwanir