bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools norm --multialllelics -both v1.8 does not split FORMAT/AS_FilterStatus correctly

Open johjoon opened this issue 3 years ago • 4 comments

hello,

In my vcf after splitting multiallelic sites, the info was separated wrongly, like this Before splitting

chrM	16189	.	T	C,A	.	PASS	AS_FilterStatus=SITE|weak_evidence,base_qual,strand_bias,possible_numt;	 GT:AD:AF:DP:F1R2:F2R1:SB	0/1/2:3,2016,22:0.995,4.028e-03:2041:1,814,6:2,1101,3:0,3,659,1379

After splitting in bcftools norm -m -both mode

MT	16189	.	T	C	.	PASS	AS_FilterStatus=SITE|weak_evidence;	GT:AD:AF:DP:F1R2:F2R1:SB 0/1/0:3,2016:0.995:2041:1,814:2,1101:0,3,659,1379

MT	16189	.	T	A	.	PASS	AS_FilterStatus=base_qual;	GT:AD:AF:DP:F1R2:F2R1:SB	0/0/1:3,22:0.004028:2041:1,6:2,3:0,3,659,1379

Obviously, 'AS_FilterStatus' should be separated with '|',but it has been splitted by ',' So ,is there any solutions?

johjoon avatar Sep 04 '21 05:09 johjoon

@pd3 Appreciate for your answer. Thanks!

johjoon avatar Sep 04 '21 05:09 johjoon

How is the AS_FilterStatus annotation defined in the header? The program would always split by , in some way, because that's how the VCF specification represents vector fields.

pd3 avatar Sep 07 '21 14:09 pd3

AS_FilterStatus was defined as below. ##INFO=<ID=AS_FilterStatus,Number=A,Type=String,Description="Filter status for each allele, as assessed by ApplyRecalibration. Note that the VCF filter field will reflect the most lenient/sensitive status across all alleles."> Is there any solution to this problem?

Thanks!

johjoon avatar Sep 28 '22 13:09 johjoon

The number of values in the VCF record does not match the header definition. The header defines the tag as Number=A, therefore for two alternate alleles A,C there should be two comma-separated strings. However, there are four. I am not sure where you got the notion that the fields should be separated with |, but that assumption is incorrect.

pd3 avatar Sep 28 '22 14:09 pd3