hap.py icon indicating copy to clipboard operation
hap.py copied to clipboard

Is it possible to output 0/0 and ./. positions in VCF?

Open afzm opened this issue 6 years ago • 6 comments

Hi, the thing is that when I input a VCF file with positions with ./. and 0/0 pre.py removes them from the final vcf file, I understand that there are no variants there, but is it possible to just normalize the variants and print as well those rows? Thank you very much

afzm avatar Apr 11 '19 09:04 afzm

How do you normalize a 0/0 or ./. variant?

Lenbok avatar Apr 11 '19 21:04 Lenbok

I mean normalizing the rest of positions (variants with 0/1 1/1 GT), but leave untouched those lines with info and a GT of ./. or 0/0 because no variant was detected there. Similar to normalizing a GVCF file, keeping the positions without variants, untouched. Thank you

afzm avatar Apr 12 '19 10:04 afzm

In general this is not possible, because the act of normalizing variants of non-REF genotypes can mean that it is no longer valid to retain other sites that were reported as REF or ./. - a simple example of this is at either end of a homopolymer run where the sample may have a 0/0 call at one end, and say 1/0 for an insertion at the other end. Normalizing the insertion may shift it onto the position where the sample had a call of 0/0, so it would no longer be consistent to retain the 0/0 call.

Lenbok avatar Apr 14 '19 21:04 Lenbok

Thank you, so then, is there any way of normalizing a GVCF file with pre.py? And a multiVCF file? Without turning them into VCFs files?

afzm avatar Apr 15 '19 14:04 afzm

At this point, pre.py cannot do this.

Depending on the interpretation of 0/0 this can be quite a tricky problem: if we assume 0/0 to mean "reference with no SNP at this location" then it would be possible to do something by limiting how far variants may be shifted, but if you include the notion of "reference with no insertion/deletion", things get tricky without looking at the reads.

Also the multi-VCF merging problem can be tricky -- there is a tool to do this here: https://github.com/Illumina/gvcfgenotyper

pkrusche avatar Apr 16 '19 11:04 pkrusche

Okey thank you!

afzm avatar Apr 16 '19 16:04 afzm