TRTools
TRTools copied to clipboard
Merge STR: More than one value found for END
Im trying to merge around 3 files of GangSTR output into one mergefile. while doing it, Im facing an issue of
More than one value found for END
it ran for 89k lines and it stopped after this line
(bharath) []$ tail -1 test.vcf.vcf
chr10 17809113 . CCTCCCCTCCCCTCCCCTCCCCTCC . . . END=17809163;PERIOD=5;RU=cctcc;REF=5.0;STUTTERUP=0.05;STUTTERDOWN=0.05;STUTTERP=0.9;EXPTHRESH=-1 GT:DP:Q:REPCN:REPCI:RC:ML:INS:STDERR:ENCLREADS:FLNKREADS:QEXP 0/0:49:1.0:5,5:5-5,5-5:29,20,0,0:286.292:419.097,96.3636:0.0,0.0:5,29:NULL:-1.0,-1.0,-1.0 0/0:28:0.999683:5,5:5-5,5-5:16,12,0,0:167.282:416.998,95.8552:0.0,0.0:5,16:NULL:-1.0,-1.0,-1.0 0/0:42:1.0:5,5:5-5,5-5:25,17,0,0:251.281:415.347,94.2128:0.0,0.0:5,25:NULL:-1.0,-1.0,-1.0
I couldn't able to figure out the error from the vcf files . the following lines are the next lines in each file, I have found out that multiple END values are given for the same location. but how do i resolve this issue?
(bharath) []$ zcat *.vcf.gz | grep -w "17813632"
chr10 17813632 . TATA . . . END=17813701;EXPTHRESH=-1;GRID=1,5;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05 GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP 0/0:27:0.970717:2,2:2-2,2-2:8,19,0,0:2,8:NULL:187.217:415.347,94.2128:0,0:-1,-1,-1
chr10 17813632 . TATA . . . END=17813703;EXPTHRESH=-1;GRID=1,5;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05 GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP 0/0:27:0.962654:2,2:2-2,2-2:8,19,0,0:2,8:NULL:187.622:415.347,94.2128:0,0:-1,-1,-1
chr10 17813632 . TATA . . . END=17813703;EXPTHRESH=-1;GRID=1,5;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05 GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP 0/0:20:0.315071:2,2:1-3,1-3:2,18,0,0:2,2:NULL:160.457:416.998,95.8552:0.466294,0.466294:-1,-1,-1
chr10 17813632 . TATA TATATA . . END=17813701;EXPTHRESH=-1;GRID=1,6;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05 GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP 0/1:24:0.33128:2,3:2-3,2-6:2,19,0,3:2,2:2,2|3,1:194.26:416.998,95.8552:0.500759,0.76488:-1,-1,-1
chr10 17813632 . TATA . . . END=17813701;EXPTHRESH=-1;GRID=1,103;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05 GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP 0/0:14:0.00196308:2,2:1-23,1-23:0,14,0,0:NULL:NULL:115.062:419.097,96.3636:7.19692,7.19692:-1,-1,-1
chr10 17813632 . TATA TATATA . . END=17813703;EXPTHRESH=-1;GRID=1,103;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05 GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP 1/1:14:0.00181527:3,3:1-24,1-24:0,14,0,0:NULL:NULL:115.131:419.097,96.3636:6.13539,6.13539:-1,-1,-1
On the line merging stopped at, looking at the POS and length of the REF allele you would conclude that the coordinate of the last base pair of the REF allele is "17809137". But the given END info field is "17809163". I assume it's erroring out because those don't match. I would look to see which of the POS/REF/END fields was incorrectly set upstream.