hifiasm
hifiasm copied to clipboard
High BUSCO duplication rate for low coverage HiFi data
Dear authors,
I am applying Hifiasm to assembly our Hifi read data (DNA from a single individual for each species). However, we notice that the duplication rate (based on BUSCO) is negatively correlated with estimated read coverage. And we further confirmed with previous published short-read assembly that the high duplication rate in our low-coverage assemblies is not species-specific (see attached). And I think the correlation is because low sequencing depth leads to difficulty in assembling heterozygous regions.
We are now using Purge_Dups to remove potential false duplication, but may I ask is there any other parameters that we could use in hifiasm, where one can account for the sequencing depth?
Species
Sex
Estimated coverage
N50
# contig
Estimated genome size
Contig length
Completeness
Singleton
Duplicated
Fragment
Missing
Odontotermes
F.
28
5800000
1291
1194220328
1449477267
99.4%
97.4%
2 %
0.4%
0.2%
Trinervitermes
U.
10
605000
6072
348834812
1726917765
98.1%
83 %
15.1%
1 %
0.9%
C. secundus
U
13
1143825
3705
1182129932
1296819224
99.5%
90.1%
9.4%
0.2%
0.3%
C. secundus(Short reads, reference)
Mix
1184893
55483
1182129932
1018932804
98.8%
97 %
1.8%
0.7%
0.5%
Macrotermes bellicosus PNP
M
12
564726
5671
1214356371
1411697024
99.3%
89 %
10.3%
0.4%
0.3%
Macrotermes bellicosus (HiFi, reference)
21
11 MB
428
1113805679
1341469195
99.7%
96.5%
3.2%
0
0.3%
For the coverage issue, I guess so. The built-in purge_dups of hifiasm could be tuned like: https://hifiasm.readthedocs.io/en/latest/faq.html#p-large, which may work better in some cases.
Thank you for the reply... However, after setting --purge-max = 20 (twice the homozygous read coverage based on K-mer) and -s = 0.3, the BUSCO duplication rate still remains very high (15%).
We also tested genome assembly with Hi Canu. By setting correctedErrorRate=0.105 (https://canu.readthedocs.io/en/latest/parameter-reference.html#correctederrorrate), we got a BUSCO duplication rate of 3.9%, which is more reasonable to us...
Are the assemblies of HiCanu more contiguous than those of hifiasm?
It seems so.. but of course, it's only for the species that with very low sequencing coverage ...
Species
Method
N50
# contig
Estimated genome size
Contig length
Completeness
Singleton
Duplicated
Fragment
Missing
Trinervitermes10x
Hifiasm
605 KB
6072
X
1726917765
98.1%
83 %
15.1%
1 %
0.9%
Hifiasm + purge
716 KB
3610
1457166292
96.9%
94.8%
2.1%
1.5%
1.6%
Canu
734 KB
10555
1593394689
98.6%
94.7%
3.9%
1.1%
0.3%
Thanks a lot. It is reasonable as we haven't optimized hifiasm for such low coverage of reads.