hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Homozygous diploid trio assembly assigning largely ambiguous nodes to only one haplotype

Open ASLeonard opened this issue 7 months ago • 10 comments

Hi,

I'm assembling a 3 Gb diploid mammal, but we know it is likely to be quite homozygous. This is confirmed by the k-mer peak, and hifiasm "correctly" identifies the peaks as [M::ha_pt_gen] peak_hom: 46; peak_het: 24. image

However, I noticed the haplotype-resolved assemblies (using parental k-mers from yak) are quite unbalanced at times. Below is the dip.p_utg.gfa graph. There are for example 2 large nodes (yellow and green) that are given to both hap1 and hap2. These nodes have a similar amount of maternal (m) or paternal (p) assigned reads, or all ambiguous (a). However, the blue and red nodes are only present in hap2/maternal and near completely missing from hap1/paternal, despite the fact they are the single entry/exit nodes. The yak k-mers suggest these nodes are slightly more maternal than paternal, but are clearly overwhelmingly ambiguous, but are not assigned to both haplotypes. Is this the expected behaviour, or should the blue and red nodes (with ambiguous reads >> maternal reads) be assigned to both haplotypes?

image

I'm not sure if changing --hom-cov would help since the peak is at 46, and since it is a trio -s/-l aren't on by default.

ASLeonard avatar Nov 22 '23 15:11 ASLeonard