"-nan" GP and genotype coding in males
It is not a bug but I wonder if there is an option for users to choose coding in a desired way. For instance, a male has GT:PL:DP:AD:GP:GQ = 1:180,0:8:0,8:-nan:127 in chrX.
- GP is "-nan" while it seems to be a definite value like "0,1".
- It will allow us to run downstream analyses (e.g. Beagle imputation) more conveniently if GT is coded as "0/0" or "1/1".
The following GT:PL:DP:AD:GP:GQ are observed in a male on chrX 1:230,235:38:22,16:-nan:127 0:20,38:5:3,2:0.940634,0.0593659:12 0:35,64:13:9,4:0.99192,0.00808017:20 Notes: 1) PL is not normalized; 2) Both REF and ALT alleles exist, e.g. AD = 22,16; 3) GP is observed to be -nan. Wanted to make sure variant calling is alright for males and chrX.
What program was used to generate the VCF?
The vcf file was generated by bcftools call
By the way, bcftools call -S worked well using a samples-file. Unfortunately, bcftools convert -S and bctfools view -S failed if the same samples-file was supplied. For instance, bcftools view -S spit an error message "Sample name mismatch: sample #1 not found in the header".
The first line of the samples-file is as follows 4512-JFI-0335 M
Part of the line that contains sample names in the vcf file is as follows #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4512-JFI-0335 4512-JFI-0336 4512-JFI-0502 4512-JFI-0503 4512-JFI-0504 4512-JFI-0505 4512-JFI-0506 4512-JFI-0507 4512-JFI-0508 4512-JFI-0579 4512-JFI -0580 4512-JFI-0581 4512-JFI-0582 95-35734 95-38313 95-38314 95-39664 95-39665
We see the first sample in the vcf file is the the sample in the first line of the samples-file.
Can you please provide a small test case to reproduce both problems? Ideally, can you open a new issue for the other one, as it is unrelated to this one?
Code (testBcftools.s) and data are available at https://palmerlab.s3.sdsc.edu/minio/debug/. Both "-nan" and the "bcftools call -S" error can be reproduced.