hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

haps are smaller than expected

Open m-jahani opened this issue 2 years ago • 7 comments

Hi, I assemble a diploid plant genome with default HiC mode in HIFIasm (0.15.5-r350). The genome size is expected to be 811M (based on flow cytometry). The results look good, but I would like to push the quality as much as I can.

Here is the result that I got:


Information for *asm.hic.hap1.p_ctg.gfa

total contigs length: 789715320 as % of genome: 96.54 % N50 5443612 BUSCO: C:96.9%[S:94.1%,D:2.8%],F:0.3%,M:2.8%,n:2326


information for *asm.hic.hap2.p_ctg.gfa total contigs length: 776385689 as % of genome: 94.91 % N50 4514560 BUSCO: C:97.6%[S:95.1%,D:2.5%],F:0.3%,M:2.1%,n:2326


information for *asm.hic.p_ctg.gfa total contigs length: 844829462 as % of genome: 103.28 % N50 12490608 BUSCO: C:98.0%[S:91.9%,D:6.1%],F:0.3%,M:1.7%,n:2326

log file:

hifiasm.log


Is it possible to improve my assembly size with tweaking settings? hap1 and hap2 are 789715320 and 776385689, respectively. But the expected genome size is 811000000.

hap1 and hap2 have different sizes, is there any way for balancing haps?

Thanks

m-jahani avatar Aug 03 '21 22:08 m-jahani

May I ask what's the size of the *hic.p_ctg.gfa*? Does this sample have sex chromosomes?

chhylp123 avatar Aug 04 '21 02:08 chhylp123

The size of *asm.hic.p_ctg.gfa is 844829462. Yes, it does have sex chromosome. The target genome is a female plant with XX sex chromosomes.

m-jahani avatar Aug 04 '21 02:08 m-jahani

I personally think your assemblies are already pretty good. It is very hard to make two haplotypes have equal size due to centromeric regions. As for the smaller size, I have no idea if hifiasm really misses some regions or two haplotypes should be such small. Could you please get the Hi-C heatmap or perform contig-to-contig alignment between two haplotypes? Both of these two solutions may tell you if hifiasm miss some regions (although I don't think hifiasm will lose 20Mb contigs for each haplotype).

chhylp123 avatar Aug 04 '21 12:08 chhylp123

Thanks for your reply. I will try Hi-C heatmap and/or contig-to-contig alignment.

Another Question. When I decrease the -s parameter to 48, haps sizes are much closer (balance), and BUSCO results are better too:


Information for *asm.hic.hap1.p_ctg.gfa with -s48 total contigs length: 780788341 BUSCO: C:96.4%[S:93.8%,D:2.6%],F:0.4%,M:3.2%,n:2326


information for *asm.hic.hap2.p_ctg.gfa with -s48 total contigs length: 781757043 BUSCO: 97.8%[S:95.2%,D:2.6%],F:0.3%,M:1.9%,n:2326


Do you recommend using -s48? Would not that change other aspects of assembly quality?

Thanks

m-jahani avatar Aug 05 '21 16:08 m-jahani

Are you using -s0.48 or -s48?

chhylp123 avatar Aug 05 '21 16:08 chhylp123

My bad, I meant --hom-cov 48. Would any of -S or --hom-cov work in my case?

m-jahani avatar Aug 05 '21 17:08 m-jahani

Yean, --hom-cov should be set to hom peak. You can try different values for -s to see if the results are improved. Hifiasm is pretty fast when bin file has been generated.

chhylp123 avatar Aug 05 '21 17:08 chhylp123