hap.py
hap.py copied to clipboard
Is it possible to output 0/0 and ./. positions in VCF?
Hi, the thing is that when I input a VCF file with positions with ./. and 0/0 pre.py removes them from the final vcf file, I understand that there are no variants there, but is it possible to just normalize the variants and print as well those rows? Thank you very much
How do you normalize a 0/0 or ./. variant?
I mean normalizing the rest of positions (variants with 0/1 1/1 GT), but leave untouched those lines with info and a GT of ./. or 0/0 because no variant was detected there. Similar to normalizing a GVCF file, keeping the positions without variants, untouched. Thank you
In general this is not possible, because the act of normalizing variants of non-REF genotypes can mean that it is no longer valid to retain other sites that were reported as REF or ./. - a simple example of this is at either end of a homopolymer run where the sample may have a 0/0 call at one end, and say 1/0 for an insertion at the other end. Normalizing the insertion may shift it onto the position where the sample had a call of 0/0, so it would no longer be consistent to retain the 0/0 call.
Thank you, so then, is there any way of normalizing a GVCF file with pre.py? And a multiVCF file? Without turning them into VCFs files?
At this point, pre.py cannot do this.
Depending on the interpretation of 0/0 this can be quite a tricky problem: if we assume 0/0 to mean "reference with no SNP at this location" then it would be possible to do something by limiting how far variants may be shifted, but if you include the notion of "reference with no insertion/deletion", things get tricky without looking at the reads.
Also the multi-VCF merging problem can be tricky -- there is a tool to do this here: https://github.com/Illumina/gvcfgenotyper
Okey thank you!