hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Hifi + Parental data / Hap1 Hap2 size difference

Open cbirbes opened this issue 2 years ago • 3 comments

Hi, i'm working on Bovine genome and i am facing haplotype problem with hifiasm. I did an assembly with hifiasm V0.15 with following command : hifiasm -t 32 -o HN_Simple.asm -1 16665.yak -2 40913.yak files.fastq.gz and an assembly with V0.16.1 with the exact same command.

With 0.15 i have: 3.164.538.244 bp Hap1 and 3.021.567.026 bp Hap2, which is ok (3Gb expected)

With 0.16.1: 3.206.779.001 bp Hap1 and 2.444.098.755 Hap2, which is a huge difference ...

I test yak trioeval on both and got :

0.15: Hap1: W 1282821 7874993 0.162898 H 1235942 7881010 0.156825

Hap2: W 1828122 7642062 0.239218 H 2873652 7645788 0.375848

0.16.1: Hap1: W 1257062 7815772 0.160837 H 1169095 7820441 0.149492

Hap2: W 1437868 6204536 0.231745 H 2240101 6208883 0.360790

I have a high duplication BUSCO on hap1 0.16.1 (13%) and i got similar results on goat genomes

Do you have any tips for making good quality assemblies of similar size?

cbirbes avatar May 11 '22 13:05 cbirbes

What is the hamming error rate of trio-binning assemblies? Looks like the hamming error rate is very high. The unbalanced size issues are often caused by wrong hom/het peaks (see: https://hifiasm.readthedocs.io/en/latest/faq.html#how-can-i-tweak-parameters-to-improve-hi-c-integrated-assembly).

chhylp123 avatar May 11 '22 16:05 chhylp123

These results are from trio assembly.

I don't have HiC for my bovine genome but on goat the HiC assembly seems better than trio (in term of haplotype size): hap1 HiC : 2.913.416.693 bp W 103733 3449742 0.030070 H 417700 3449938 0.121075

hap2 HiC: 2.764.934.144 bp W 202348 3798587 0.053269 H 987714 3798753 0.260010

hap1 Trio: 3.090.251.445 bp W 75148 3839676 0.019571 H 102934 3840084 0.026805

hap2 Trio: 2.576.405.207 bp W 227666 3385745 0.067243 H 1133417 3386144 0.334722

the Hamming error rate of both Hap2 is quite high no ?

cbirbes avatar May 12 '22 11:05 cbirbes

Yes, very high. Probably you should check if the parental data is correct.

chhylp123 avatar May 12 '22 13:05 chhylp123