SnpSift icon indicating copy to clipboard operation
SnpSift copied to clipboard

Incorrect filtering with !(ANN[*].ERRORs in SET[0])

Open dpryan79 opened this issue 9 months ago • 0 comments

Suppose we have the following entries in a vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  FOO
chr2    12345   123     AC      A       .       PASS    DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTA|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME,A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTB|protein_coding|3/3|c.*124+4493delG||||||        GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
chr2    123456  1234    AC      A       .       PASS    DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTA|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME      GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
chr2    1234567 12345   AC      A       .       PASS    DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTA|protein_coding|3/3|c.*124+4493delG||||||,A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTB|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME        GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB  0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14

And we also have a file with Errors/warnings that should be used as a set for exclusion:

INFO_REALIGN_3_PRIME

Each variant line has such a warning entry, so if we exclude such lines with SnpSift filter "!( ANN[*].ERRORS in SET[0])" -s errorList.txt -f foo.vcf | grep -v "#" we shouldn't get any lines remaining. However we actually get the following:

chr2    12345   123     AC      A       .       PASS    DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTA|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME,A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTB|protein_coding|3/3|c.*124+4493delG||||||                                                                                                         GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB   0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14
chr2    1234567 12345   AC      A       .       PASS    DP=699;MQ=250.00;FractionInformativeReads=0.953;SoftClipRatio=0.03;ANN=A|intron_variant|MODIFIER|ENSG|ENSG|transcript|ENSTA|protein_coding|3/3|c.*124+4493delG||||||,A|frameshift_variant|HIGH|ENSG|ENSG|transcript|ENSTB|protein_coding|5/10|c.3267delC|p.Phe1090fs|3339/4233|3267/4089|1089/1362||INFO_REALIGN_3_PRIME                                                                                                         GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB   0/1:7.84:646,20:0.0300:293,7:353,13:666:358,288,11,9:339,307,6,14

It seems that ANN[*].ERRORS is just not working properly with set insection. Amusingly all lines are returned without the !, which is correct, so perhaps only the negation part of that doesn't seem to work correctly.

dpryan79 avatar Apr 16 '25 11:04 dpryan79