vcfanno icon indicating copy to clipboard operation
vcfanno copied to clipboard

vcfanno doesn't annotate sites that are polymorphic in query vcf but fixed for reference allele in annotation vcf

Open AaronRuben opened this issue 2 years ago • 2 comments

Hi Brent,

I was trying to annotate 1KGP VCFs with genotype information of archaic hominins (e.g., Altai Neanderthal). These individuals have a lot of sites that are homozygous for the reference allele, for example:

20 60343 . G .

while this site is polymorphic in 1KGP:

20 60343 . G A

These sites match but a currently not annotated unless the --permissive-overlap flag is set, which isn't ideal. I know this is an edge case, and I can't simply merge the VCFs because the inclusion of archaic hominins would mess up downstream steps.

Would be possible to handle such cases in future?

Thanks, Aaron

AaronRuben avatar Feb 09 '23 15:02 AaronRuben

Hi Aaron, the only way to do this is with --permissive-overlap as you note. I think that's the correct behavior as "G ." should not match with G A". if the are homozygous reference only, then the more correct would be "G G".

brentp avatar Feb 09 '23 16:02 brentp

Hi Brent,

Thanks for the quick response.

If it would be "G G", it would still not match with "G A". I also think "G ." makes more sense, as there is no alternative allele.

In either case, it would be great to allow matches of polymorphic and monomorphic sites (whether denoted by "G ." or "G G") when the reference alleles match.

AaronRuben avatar Feb 09 '23 16:02 AaronRuben