vcfanno
vcfanno copied to clipboard
vcfanno doesn't annotate sites that are polymorphic in query vcf but fixed for reference allele in annotation vcf
Hi Brent,
I was trying to annotate 1KGP VCFs with genotype information of archaic hominins (e.g., Altai Neanderthal). These individuals have a lot of sites that are homozygous for the reference allele, for example:
20 60343 . G .
while this site is polymorphic in 1KGP:
20 60343 . G A
These sites match but a currently not annotated unless the --permissive-overlap
flag is set, which isn't ideal. I know this is an edge case, and I can't simply merge the VCFs because the inclusion of archaic hominins would mess up downstream steps.
Would be possible to handle such cases in future?
Thanks, Aaron
Hi Aaron, the only way to do this is with --permissive-overlap
as you note.
I think that's the correct behavior as "G ." should not match with G A". if the are homozygous reference only, then the more correct would be "G G".
Hi Brent,
Thanks for the quick response.
If it would be "G G", it would still not match with "G A". I also think "G ." makes more sense, as there is no alternative allele.
In either case, it would be great to allow matches of polymorphic and monomorphic sites (whether denoted by "G ." or "G G") when the reference alleles match.