gtc2vcf
gtc2vcf copied to clipboard
Drop marker if SourceSeq maps to different loci equally well.
I see that typically gtc2vcf does a fantastic job in infering the coordinates with the SourceSeq mapping workflow. However, I note that in some cases when SourceSeq maps to different loci equally well, one of the mapping gets picked up by gtc2vcf instead of dropping the marker.
Here is an example for rs10435524 which maps to chr8:30404042
. The SourceSeq is below:
TTCACTGGATACAAAAATGCTTAGCAGATAATTTCTGGGGGTGTCATTCTTGAGTTTATC[A/G]GACACCGTGAAGTGTGTTGCTTTTTGTGTGTTAGGTGCTTGCTATATTTTTCTGGCTATT
The bwa alignment of the SourcSeq looks like this:
rs10435524-138_B_R_2276283903:1 0 chr8 30403982 0 121M *0 0 TTCACTGGATACAAAAATGCTTAGCAGATAATTTCTGGGGGTGTCATTCTTGAGTTTATCAGACACCGTGAAGTGTGTTGCTTTTTGTGTGTTAGGTGCTTGCTATATTTTTCTGGCTATT * NM:i:1MD:Z:60G60 AS:i:116 XS:i:116 XA:Z:chrX,+37095221,121M,1; rs10435524-138_B_R_2276283903:2 0 chrX 37095221 0 121M *0 0 TTCACTGGATACAAAAATGCTTAGCAGATAATTTCTGGGGGTGTCATTCTTGAGTTTATCGGACACCGTGAAGTGTGTTGCTTTTTGTGTGTTAGGTGCTTGCTATATTTTTCTGGCTATT * NM:i:0MD:Z:121 AS:i:121 XS:i:121 XA:Z:chr8,+30403982,121M,0;
The gtc2vcf outputs the mapping as chrX:37095281
. The SourceSeq in this case seems not uniquely mapping to the reference and I think can be filtered out. I see that something along these lines is already implemented in gtc2vcf, however, I still see markers like this slip through. Do you know why?
I have additional examples like this. Overall a small percentage of all markers of course but still appears inaccurate and wanted to see if we can fix it.
Thanks.