bgt
bgt copied to clipboard
BGT issue with Multi Allelic Variant Sites
Hello. As a test I ran BGT on chrX of 1KGP3 (available from the FTP link below) ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chrX.phase3_shapeit2_mvncall_integrated_v1c.20130502.genotypes.vcf.gz
The commands I used were (I converted the vcf.gz file above to BCF with bcftools to save space) :
bgt import chrX.bcf out/chrX
bgt view -b out/chrX.bgt > out/chrX.bcf
When I compared the output file with the input file with bcftools view, the first variant is strangely split.
Expected (only first sample GT shown because of size...) :
X 60020 . T TA,TAAC 100 PASS AC=10,92;AF=0.00199681,0.0183706;AN=5008;NS=2504;DP=11848;AMR_AF=0.0029,0.0086;AFR_AF=0.0008,0.0635;EUR_AF=0,0.002;SAS_AF=0.0031,0;EAS_AF=0.004,0;VT=INDEL;MULTI_ALLELIC GT 0|0 ...
And I got two variants :
X 60020 . T TA,<M> 0 . . GT 0/0 ...
X 60020 . T TAAC,<M> 0 . . GT 0/0 ...
I understand the the multi-allelic site has been split however what is strange is that in both lines I get genotype values between 0
, 1
, and 2
.
So how to interpret when for example 2/0
occurs in the first line or in the second line ?
Thanks. Best Rick