nanopolish icon indicating copy to clipboard operation
nanopolish copied to clipboard

Homozygous variant called as heterozygous

Open marcotoffoli opened this issue 4 years ago • 4 comments

Dear Jared,

For one variant Nanopolish is calling a Hom position as het. Below the INFO

155205331 | . | T | C | 1794 | PASS | BaseCalledReadsWithVariant=188;BaseCalledFraction=0.817391;TotalReads=222;AlleleCount=1;SupportFraction=0.900699 | GT | 0/1

Any idea of why this might be happening?

Thank you!

marcotoffoli avatar Feb 21 '20 15:02 marcotoffoli

I found a closed issue (#423 ) that was reporting a similar problem, but it hasn't been resolved as far as I can see. In my run, I have other variants with lower SupportFraction that are called as homozygotes, so I don't get why this one is called as heterozygote.

Also, I would like to add that Nanopolish is calling variants that I believe are hom as het in multiple positions, not just the one reported above.

marcotoffoli avatar Mar 03 '20 10:03 marcotoffoli

I'd like to add another finding: take the variant below: 155208183 | . | T | C | 30230.1 | PASS | BaseCalledReadsWithVariant=3422;BaseCalledFraction=0.651313;TotalReads=5059;AlleleCount=2;SupportFraction=0.613717 | GT | 1/1

I think in this case the support fraction is a bit misleading: the bam file on that position has 1267 deletions, so the total number of reads is actually around 3900, rather than 5059 (which is the total number of reads, including the deletions). However, when calculating the support fraction, Nanopolish uses 5059, so the support fraction is 0.61. Although this is not a bug, I feel that the it might be misleading somehow.

marcotoffoli avatar Mar 05 '20 13:03 marcotoffoli

Hi @marcotoffoli,

Sorry for the slow response. The genotypes are set based on the likelihoods that nanopolish's HMM calculates, rather than SupportFraction. This can lead to (hopefully rare) cases where the genotype and SupportFractions seem inconsistent, as you show here. In the first case, there is probably a small population of reads that (erroneously) strongly support the reference allele which pushes the best genotype to be 0/1.

Can I see an IGV screenshot of the second case? Is the deletion real, or an alignment artifact? Nanopolish doesn't take the alignment into consideration, so the fact that some reads show a deletion doesn't matter, it re-scores all reads against the candidate sequences with its HMM.

jts avatar Mar 05 '20 14:03 jts

Dear Jared, Thank you for your reply!

Regarding the first example (supposedly hom variant called as het), I found this happening on 4 intronic variants on 4 samples (in total 16 times) in a batch of 37 samples. I'm fairly sure these are hom as they are part of a common haplotype and the other variants of the haplotype are hom. Moreover, another variant caller confirms they are hom. Is there anything I can do to prevent this from happening? I was thinking of using a script to change to hom all het variants for which support fraction is above 0.75, do you think this would make sense?

Regarding the second example (hom variant called correctly, but with low support fraction), I think the deletions are an artifact, as the alt allele creates a polyC sequence and all samples show a high number of deletions in this same position. This is less of a problem, as I believe Nanopolish is calling the variant correctly. Below screenshot of the variant. Screenshot from 2020-03-05 15-38-15

Thank you!

marcotoffoli avatar Mar 05 '20 15:03 marcotoffoli