hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

choosing l based on heterozygosity

Open dcopetti opened this issue 2 years ago • 1 comments

Hello, I wonder if there is a way to choose right away the best -l value when working with genomes at different levels of heterozygosity. The genome I am handling (plant, ~340 Mb) is fairly heterozygous (caus_hifasm_l3_stdout.txt ) and I assembled it with three values of -l: and these are the stats of the assemblies:

assembly total_length number shortest N50 N50n N70 N70n N90 N90n
l3.bp.p 361,374,947 67 13,234 37,233,309 5 32,123,771 7 25,947,092 9
l3.bp.hap1.p 345,491,540 113 15,222 32,337,488 5 28,814,549 7 8,655,938 11
l3.bp.hap2.p 341,046,168 91 15,736 30,077,187 5 17,614,102 8 6,036,483 14
                   
l2.bp.p 369,679,714 71 13,234 37,233,309 5 32,123,771 7 13,530,942 10
l2.bp.hap1.p 359,961,961 110 15,222 32,337,488 5 20,789,015 8 10,450,913 12
l2.bp.hap2.p 327,077,977 94 15,736 30,077,187 5 28,814,546 7 8,551,917 12
                   
l1.bp.p 403,965,254 83 13,234 36,517,093 5 28,814,546 8 8,959,895 13
l1.bp.hap1.p 355,422,637 93 15,222 32,337,488 5 20,789,015 8 10,450,913 12
l1.bp.hap2.p 321,495,862 45 18,696 32,764,075 5 19,928,080 7 8,549,279 12

Decreasing from -l3 to -l1 the following trends can be seen:

  • the total size of the p_ctg increases
  • the imbalance in total size between hap1 and hap2 increases
  • Nx values between hap1 and hap2 get closer.

These assemblies are already very good (2n=18, so most chromosomes are in 1-2 contigs already), but I wonder if there is a way to predict which -l value will be more appropriate without running different assemblies. Maybe based on the relative height of the two k-mer peaks? Thanks,

Dario

dcopetti avatar Oct 22 '21 23:10 dcopetti

-l1 or -l2 does more conservative purging so that if the heterozygosity rate is higher, some homologous pairs cannot be identified. In this case, hap1 will be larger than hap2. My personal thought is to always use -l3. If the heterozygosity rate is extremely high, have a try with smaller -s. Purging is still a little bit messy...

chhylp123 avatar Oct 25 '21 14:10 chhylp123