Flye
Flye copied to clipboard
Incorrect haplotype polishing?
Hi,
This issue is related to the --polish-haplotypes
subcommand of the --polish-target
command in the Flye version used by HapDup. Now, given a region, I have the following haplotypes from HapDup (from top to bottom: haplotype 1, haplotype 2 and haploid assembly straight from Flye):
The screenshot is centered on this one SNP on haplotype 2 (middle track) which is incorrect. I went back to the (Margin) phased reads mapped to the haploid assembly:
Screenshot again is centered on the same SNP. It makes no doubt here that something went wrong with the phasing but if we put this aside, the blue reads are the ones used to polish the haploid assembly for haplotype 2: coverage over this region is 3 reads and they all have a different base at this location, only one of them supporting the SNP output in haplotype 2. Any idea what is happening here?
Thanks, Guillaume
Hi,
I think the reason for that is that hapdup is actually using unphased reads to polish, in addition to either of the haplotypes.
This is an old strategy we went with, because we expect some regions to be unphased. But now you can actually try only using phased reads for polishing, and regions with no phased reads should just be left untouched.
If you are able to modify hapdup code, here is the line to change (remove "0," from --polish-haplotypes
): https://github.com/fenderglass/hapdup/blob/main/hapdup/main.py#L222
Let me know if that makes your assemblies better in general, it might eventually make sense to make this a default option.
Best, Mikhail
Just realizing I left this issue open without an answer, sorry about that. Thanks for the feedback!