bcftools
bcftools copied to clipboard
bcftools norm --multialllelics -both v1.8 does not split FORMAT/AS_FilterStatus correctly
hello,
In my vcf after splitting multiallelic sites, the info was separated wrongly, like this Before splitting
chrM 16189 . T C,A . PASS AS_FilterStatus=SITE|weak_evidence,base_qual,strand_bias,possible_numt; GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:3,2016,22:0.995,4.028e-03:2041:1,814,6:2,1101,3:0,3,659,1379
After splitting in bcftools norm -m -both
mode
MT 16189 . T C . PASS AS_FilterStatus=SITE|weak_evidence; GT:AD:AF:DP:F1R2:F2R1:SB 0/1/0:3,2016:0.995:2041:1,814:2,1101:0,3,659,1379
MT 16189 . T A . PASS AS_FilterStatus=base_qual; GT:AD:AF:DP:F1R2:F2R1:SB 0/0/1:3,22:0.004028:2041:1,6:2,3:0,3,659,1379
Obviously, 'AS_FilterStatus' should be separated with '|',but it has been splitted by ',' So ,is there any solutions?
@pd3 Appreciate for your answer. Thanks!
How is the AS_FilterStatus
annotation defined in the header? The program would always split by ,
in some way, because that's how the VCF specification represents vector fields.
AS_FilterStatus
was defined as below.
##INFO=<ID=AS_FilterStatus,Number=A,Type=String,Description="Filter status for each allele, as assessed by ApplyRecalibration. Note that the VCF filter field will reflect the most lenient/sensitive status across all alleles.">
Is there any solution to this problem?
Thanks!
The number of values in the VCF record does not match the header definition. The header defines the tag as Number=A
, therefore for two alternate alleles A,C
there should be two comma-separated strings. However, there are four. I am not sure where you got the notion that the fields should be separated with |
, but that assumption is incorrect.