hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Genome size still too large after using "-s" parameter

Open qdu-beep opened this issue 7 months ago • 0 comments

Hello, dear developers,

I encountered a strange problem. I appreciate your patience and assistance in providing me with any suggestions or ideas!I assembled a diploid genome using HiFi and Hic data, and its size is much larger than estimated based on kmer (about 2.3Gb). According to the genome survey results, the heterozygosity rate of the genome is very low (0.127%) and the duplication rate is 52.5%. The primary assembly (3.7Gb) is similar in size to the two haplotype assemblies (the default parameters and the "-s 0.45" parameter are similar).

Busco results of primary contigs: C:98.6%[S:94.3%,D:4.3%],F:0.1%,M:1.3%,n:5950 Assembly information of primary contigs : Number of contigs: 939 Total bases: 3,706,033,829 bp Max length: 252,188,764 bp Average : 3,946,787 bp Contig N50: 96,520,916 bp k21

However, the KAT results were less than ideal and there were many duplicate kmers in the assembly. kat_test-main mx spectra-cn

Importantly, based on some similar issues, I checked some information in the log files and I think hifiasm correctly identified the peak positions.

Some key information is as follows: "peak_hom: 20; peak_het: -1" "[M::purge_dups] homozygous read coverage threshold: 20 [M::purge_dups] purge duplication coverage threshold: 25" " # heterozygous bases: 510816440; # homozygous bases: 3571296092" nohup.txt

I have recently embarked on learning this field and currently possess a basic understanding of its concepts. I am hopeful that you can assist me in navigating through the challenges and finding solutions to my problems. Originally posted by @qdu-beep in https://github.com/chhylp123/hifiasm/issues/548#issuecomment-1831687644

qdu-beep avatar Nov 30 '23 08:11 qdu-beep