SnpSift
SnpSift copied to clipboard
Incorrect filtering with !(ANN[*].ERRORs in SET[0])
Suppose we have the following entries in a vcf file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FOO
chr2 12345 123 AC A . PASS DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTA|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME,A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTB|protein_coding|3/3|c.*124+4493delG|||||| GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
chr2 123456 1234 AC A . PASS DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTA|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
chr2 1234567 12345 AC A . PASS DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTA|protein_coding|3/3|c.*124+4493delG||||||,A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTB|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
And we also have a file with Errors/warnings that should be used as a set for exclusion:
INFO_REALIGN_3_PRIME
Each variant line has such a warning entry, so if we exclude such lines with SnpSift filter "!( ANN[*].ERRORS in SET[0])" -s errorList.txt -f foo.vcf | grep -v "#" we shouldn't get any lines remaining. However we actually get the following:
chr2 12345 123 AC A . PASS DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTA|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME,A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTB|protein_coding|3/3|c.*124+4493delG|||||| GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
chr2 1234567 12345 AC A . PASS DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTA|protein_coding|3/3|c.*124+4493delG||||||,A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTB|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
It seems that ANN[*].ERRORS is just not working properly with set insection. Amusingly all lines are returned without the !, which is correct, so perhaps only the negation part of that doesn't seem to work correctly.