Flye icon indicating copy to clipboard operation
Flye copied to clipboard

Incorrect haplotype polishing?

Open GuillaumeHolley opened this issue 2 years ago • 1 comments

Hi,

This issue is related to the --polish-haplotypes subcommand of the --polish-target command in the Flye version used by HapDup. Now, given a region, I have the following haplotypes from HapDup (from top to bottom: haplotype 1, haplotype 2 and haploid assembly straight from Flye): flye_hap_polish1 The screenshot is centered on this one SNP on haplotype 2 (middle track) which is incorrect. I went back to the (Margin) phased reads mapped to the haploid assembly: flye_hap_polish2 Screenshot again is centered on the same SNP. It makes no doubt here that something went wrong with the phasing but if we put this aside, the blue reads are the ones used to polish the haploid assembly for haplotype 2: coverage over this region is 3 reads and they all have a different base at this location, only one of them supporting the SNP output in haplotype 2. Any idea what is happening here?

Thanks, Guillaume

GuillaumeHolley avatar Apr 25 '22 16:04 GuillaumeHolley

Hi,

I think the reason for that is that hapdup is actually using unphased reads to polish, in addition to either of the haplotypes.

This is an old strategy we went with, because we expect some regions to be unphased. But now you can actually try only using phased reads for polishing, and regions with no phased reads should just be left untouched.

If you are able to modify hapdup code, here is the line to change (remove "0," from --polish-haplotypes): https://github.com/fenderglass/hapdup/blob/main/hapdup/main.py#L222

Let me know if that makes your assemblies better in general, it might eventually make sense to make this a default option.

Best, Mikhail

mikolmogorov avatar May 03 '22 14:05 mikolmogorov

Just realizing I left this issue open without an answer, sorry about that. Thanks for the feedback!

GuillaumeHolley avatar Nov 02 '22 11:11 GuillaumeHolley