hifiasm
hifiasm copied to clipboard
Trio mode: hybrid in HIFI + parents in Illumina, resulting in assemblies with high duplication levels
Hello @chhylp123
I used hifiasm in trio binning mode using Illumina reads from the parents and HiFi reads from the hybrid. The resulting assemblies (hap1 and hap2) have high duplication levels and genome sizes larger than estimated.
I know the trio mode doesn´t purge the assemblies by default. Thus, I was wondering... ---> Is there something wrong with my data or the way I ran hifiasm? ---> Should I run hifiasm again using purging options?
Command I used: hifiasm -o hybrid.asm -t 28 -1 pat.yak -2 mat.yak hybrid_hifi.fasta 2> hybrid.asm.trio.log
Estimated genome size (GenomeScope using hybrid HiFi reads K=31) 1090.932 MB
hybrid.asm.dip.hap1.p_ctg.fa genome size: 1655.647 MB BUSCO: C:97.3%[S:74.6%,D:22.7%],F:0.9%,M:1.8%,n:3640 trioeval: W 132265 46811966 0.002825 H 105173 46813996 0.002247 N 38490093 8323903 0.177808
hybrid.asm.dip.hap2.p_ctg.fa genome size: 1819.210 MB BUSCO: C:96.2%[S:63.0%,D:33.2%],F:1.3%,M:2.5%,n:3640 trioeval: W 167255 50679941 0.003300 H 128927 50682597 0.002544 N 10904569 39778028 0.215154
Thanks in advance
log file: hybrid.asm.trio.log
GenomeScope plot:
Have you tried to only use HiFi or short reads? Does non-hybrid parental yak index give assemblies without duplications?
Hi @chhylp123
I am a bit confused with your answer
Have you tried to only use HiFi or short reads? Did you mean to perform a hifi only assembly with hifiasm? I only performed trio mode genome assembly
Does non-hybrid parental yak index give assemblies without duplications? How can I check that? I only did yak count with parental short reads
Hi @chhylp123
I ran the trio binning mode in another fish hybrid species and got the same result: high duplication levels and larger genome size than estimated.
Command I used: hifiasm -o hybrid.asm -t 28 -1 pat.yak -2 mat.yak hybrid_hifi.fasta 2> hybrid.asm.trio.log
Estimated genome size (GenomeScope using hybrid HiFi reads K=31) 1.14 Gb
hybrid.asm.dip.hap1.p_ctg.fa genome size: 1735.019 MB GC content: 0.3951 Main genome scaffold total | 900 Main genome contig total | 900 Main genome scaffold sequence total | 1735.019 MB Main genome contig sequence total | 1735.019 MB 0.000% gap Main genome scaffold N/L50 | 51/8.544 MB Main genome contig N/L50 | 51/8.544 MB Main genome scaffold N/L90 | 346/894.454 KB Main genome contig N/L90 | 346/894.454 KB Max scaffold length | 43.415 MB Max contig length | 43.415 MB Number of scaffolds > 50 KB | 856 % main genome in scaffolds > 50 KB | 99.91% BUSCO | C:98.3%[S:64.9%,D:33.4%],F:0.9%,M:0.8%,n:3640
hybrid.asm.dip.hap2.p_ctg.fa genome size: 1588.876 MB Main genome contig sequence total: 1588.876 MB 0.000% gap Main genome scaffold N/L50 24/20.709 MB Main genome contig N/L50 24/20.709 MB Main genome scaffold N/L90 159/1.047 MB Main genome contig N/L90 159/1.047 MB Max scaffold length 50.62 MB Max contig length 50.62 MB Number of scaffolds > 50 KB 572 % main genome in scaffolds > 50 KB 99.85% BUSCO C:98.5%[S:80.2%,D:18.3%],F:0.7%,M:0.8%,n:3640
hifiasm log file: tambacu.asm.trio.log
GenomeScope files:
Right now I am running hifi only assembly of the first hybrid.
@chhylp123 what do you think is going on?
Many thanks
Carolina
In practice, one potential issue is that the parental data is not so clean, so that the trio-binning phasing may have some mistakes. This may cause the unbalanced issue of hifiasm. To double check it, 1) Could you please run HiFi-only assembly and see whether two haplotypes are balanced? 2) Hifiasm has an option called --trio-dual
, which utilizes homology information to correct trio phasing errors. Could you please have a try with both 1) and 2) ?
Hi @chhylp123
I did what you suggested, however the results didn’t change much and keep with high duplication levels
I am uploading a sheet with all the results I got so far. The hifi-only assembly and trio-duo are in red. assemblies_stats.xlsx
If you have any doubt regarding the sheet, please let me know.
Best regards
Carol
Sorry for the late reply as I was quite busy last month. So could you please have a try to run purge_dups for the primary assembly? If the purged primary assembly is as large as expected, it might be the phasing issue of hifiasm itself.