hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Mis-identified homo peaks

Open cyr2017 opened this issue 2 years ago • 8 comments

Hi,

I am working on a plant species with an estimated genome size of 2.2 Gb and heterozygous rate of 0.5% (estimated by genomesCope based on HiFi data). I ran hifiasm with HiFi data and Valid Hi-C data. The k-mer histgram has three peak, which are heter_peak (at 28), homo_peak (at 57), and repeat_peak (at 110?). I found that after read correction, hifiasm mis-identified the repeat peak as homo_peak and outpout a hap1 assembly of 316 Mb and a hap2 assembly of 2567 Mb. Log file was attached hifiasm.log .

I also took the advice from #55 and manually set --purge-cov 73. This time hifiasm output a hap1 assembly of 2628 Mb and a hap2 assembly of 1956 Mb. It seems that there are 400Mb of hap2 sequence mis-assigned to hap1. I wonder whether it is caused by the wrong homo_peak?

cyr2017 avatar Jul 19 '21 02:07 cyr2017

Could you please set "--purge-cov 57"? "--purge-cov" should be set to the hom peak. I'm also writing a detailed manual about this.

chhylp123 avatar Jul 19 '21 02:07 chhylp123

I set --purge-cov 57. Now hap1 assembly is 2637 Mb and hap2 assembly is 2005 Mb.

cyr2017 avatar Jul 19 '21 10:07 cyr2017

I see. Then could you please set smaller value for '-s'? The default value of '-s' is 0.55, you can try '-s 0.4' or even '-s 0.3'. The unbalanced issue is usually caused by high heterozygosity rate.

chhylp123 avatar Jul 19 '21 12:07 chhylp123

BTW, hifiasm prefers raw hi-c reads or valid hi-c contact reads (filtered by hic-Pro with pre-assembled contigs)?

cyr2017 avatar Jul 20 '21 07:07 cyr2017

We have only tried raw reads. By utilizing trio as ground truth, the assemblies produced by raw hi-c reads look good enough. Probably valid hi-c contact reads might be also helpful, just haven't tried that.

chhylp123 avatar Jul 20 '21 11:07 chhylp123

One thing I forgot to mention: tuning '-s' won't affect the utg graph so that hifiasm can reuse Hi-C bin files. However, tuning '--purge-cov' may affect utg graph so that you need to delete 'hicbin'.

chhylp123 avatar Jul 20 '21 13:07 chhylp123

Is --purge-cov in the latest release? I'm using 0.15.5-r350 and get an unknown option error. Can't see it in the code either.

Adamtaranto avatar Jul 29 '21 06:07 Adamtaranto

Yean, we changed it to two separate options: --hom-cov and --purge-max. See https://hifiasm.readthedocs.io/en/latest/faq.html#how-can-i-tweak-parameters-to-improve-hi-c-integrated-assembly.

chhylp123 avatar Jul 29 '21 11:07 chhylp123