hifiasm
hifiasm copied to clipboard
choosing l based on heterozygosity
Hello,
I wonder if there is a way to choose right away the best -l
value when working with genomes at different levels of heterozygosity.
The genome I am handling (plant, ~340 Mb) is fairly heterozygous (caus_hifasm_l3_stdout.txt ) and I assembled it with three values of -l
:
and these are the stats of the assemblies:
assembly | total_length | number | shortest | N50 | N50n | N70 | N70n | N90 | N90n |
---|---|---|---|---|---|---|---|---|---|
l3.bp.p | 361,374,947 | 67 | 13,234 | 37,233,309 | 5 | 32,123,771 | 7 | 25,947,092 | 9 |
l3.bp.hap1.p | 345,491,540 | 113 | 15,222 | 32,337,488 | 5 | 28,814,549 | 7 | 8,655,938 | 11 |
l3.bp.hap2.p | 341,046,168 | 91 | 15,736 | 30,077,187 | 5 | 17,614,102 | 8 | 6,036,483 | 14 |
l2.bp.p | 369,679,714 | 71 | 13,234 | 37,233,309 | 5 | 32,123,771 | 7 | 13,530,942 | 10 |
l2.bp.hap1.p | 359,961,961 | 110 | 15,222 | 32,337,488 | 5 | 20,789,015 | 8 | 10,450,913 | 12 |
l2.bp.hap2.p | 327,077,977 | 94 | 15,736 | 30,077,187 | 5 | 28,814,546 | 7 | 8,551,917 | 12 |
l1.bp.p | 403,965,254 | 83 | 13,234 | 36,517,093 | 5 | 28,814,546 | 8 | 8,959,895 | 13 |
l1.bp.hap1.p | 355,422,637 | 93 | 15,222 | 32,337,488 | 5 | 20,789,015 | 8 | 10,450,913 | 12 |
l1.bp.hap2.p | 321,495,862 | 45 | 18,696 | 32,764,075 | 5 | 19,928,080 | 7 | 8,549,279 | 12 |
Decreasing from -l3
to -l1
the following trends can be seen:
- the total size of the p_ctg increases
- the imbalance in total size between hap1 and hap2 increases
- Nx values between hap1 and hap2 get closer.
These assemblies are already very good (2n=18, so most chromosomes are in 1-2 contigs already), but I wonder if there is a way to predict which -l
value will be more appropriate without running different assemblies. Maybe based on the relative height of the two k-mer peaks?
Thanks,
Dario
-l1
or -l2
does more conservative purging so that if the heterozygosity rate is higher, some homologous pairs cannot be identified. In this case, hap1 will be larger than hap2. My personal thought is to always use -l3
. If the heterozygosity rate is extremely high, have a try with smaller -s
. Purging is still a little bit messy...