hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Trio mode: hybrid in HIFI + parents in Illumina, resulting in assemblies with high duplication levels

Open carolhsb opened this issue 7 months ago • 6 comments

Hello @chhylp123

I used hifiasm in trio binning mode using Illumina reads from the parents and HiFi reads from the hybrid. The resulting assemblies (hap1 and hap2) have high duplication levels and genome sizes larger than estimated.

I know the trio mode doesn´t purge the assemblies by default. Thus, I was wondering... ---> Is there something wrong with my data or the way I ran hifiasm? ---> Should I run hifiasm again using purging options?

Command I used: hifiasm -o hybrid.asm -t 28 -1 pat.yak -2 mat.yak hybrid_hifi.fasta 2> hybrid.asm.trio.log

Estimated genome size (GenomeScope using hybrid HiFi reads K=31) 1090.932 MB

hybrid.asm.dip.hap1.p_ctg.fa genome size: 1655.647 MB BUSCO: C:97.3%[S:74.6%,D:22.7%],F:0.9%,M:1.8%,n:3640 trioeval: W 132265 46811966 0.002825 H 105173 46813996 0.002247 N 38490093 8323903 0.177808

hybrid.asm.dip.hap2.p_ctg.fa genome size: 1819.210 MB BUSCO: C:96.2%[S:63.0%,D:33.2%],F:1.3%,M:2.5%,n:3640 trioeval: W 167255 50679941 0.003300 H 128927 50682597 0.002544 N 10904569 39778028 0.215154

Thanks in advance

log file: hybrid.asm.trio.log

GenomeScope plot: linear_plot

carolhsb avatar Nov 30 '23 12:11 carolhsb

Have you tried to only use HiFi or short reads? Does non-hybrid parental yak index give assemblies without duplications?

chhylp123 avatar Nov 30 '23 19:11 chhylp123

Hi @chhylp123

I am a bit confused with your answer

Have you tried to only use HiFi or short reads? Did you mean to perform a hifi only assembly with hifiasm? I only performed trio mode genome assembly

Does non-hybrid parental yak index give assemblies without duplications? How can I check that? I only did yak count with parental short reads

carolhsb avatar Dec 01 '23 12:12 carolhsb

Hi @chhylp123

I ran the trio binning mode in another fish hybrid species and got the same result: high duplication levels and larger genome size than estimated.

Command I used: hifiasm -o hybrid.asm -t 28 -1 pat.yak -2 mat.yak hybrid_hifi.fasta 2> hybrid.asm.trio.log

Estimated genome size (GenomeScope using hybrid HiFi reads K=31) 1.14 Gb

hybrid.asm.dip.hap1.p_ctg.fa genome size: 1735.019 MB GC content: 0.3951 Main genome scaffold total | 900 Main genome contig total | 900 Main genome scaffold sequence total | 1735.019 MB Main genome contig sequence total | 1735.019 MB  0.000% gap Main genome scaffold N/L50 | 51/8.544 MB Main genome contig N/L50 | 51/8.544 MB Main genome scaffold N/L90 | 346/894.454 KB Main genome contig N/L90 | 346/894.454 KB Max scaffold length | 43.415 MB Max contig length | 43.415 MB Number of scaffolds > 50 KB | 856 % main genome in scaffolds > 50 KB | 99.91% BUSCO | C:98.3%[S:64.9%,D:33.4%],F:0.9%,M:0.8%,n:3640

hybrid.asm.dip.hap2.p_ctg.fa genome size: 1588.876 MB Main genome contig sequence total: 1588.876 MB 0.000% gap Main genome scaffold N/L50 24/20.709 MB Main genome contig N/L50 24/20.709 MB Main genome scaffold N/L90 159/1.047 MB Main genome contig N/L90 159/1.047 MB Max scaffold length 50.62 MB Max contig length 50.62 MB Number of scaffolds > 50 KB 572 % main genome in scaffolds > 50 KB 99.85% BUSCO C:98.5%[S:80.2%,D:18.3%],F:0.7%,M:0.8%,n:3640

hifiasm log file: tambacu.asm.trio.log

GenomeScope files: plot1 log_plot

Right now I am running hifi only assembly of the first hybrid.

@chhylp123 what do you think is going on?

Many thanks

Carolina

carolhsb avatar Dec 06 '23 12:12 carolhsb

In practice, one potential issue is that the parental data is not so clean, so that the trio-binning phasing may have some mistakes. This may cause the unbalanced issue of hifiasm. To double check it, 1) Could you please run HiFi-only assembly and see whether two haplotypes are balanced? 2) Hifiasm has an option called --trio-dual, which utilizes homology information to correct trio phasing errors. Could you please have a try with both 1) and 2) ?

chhylp123 avatar Dec 07 '23 18:12 chhylp123

Hi @chhylp123

I did what you suggested, however the results didn’t change much and keep with high duplication levels

I am uploading a sheet with all the results I got so far. The hifi-only assembly and trio-duo are in red. assemblies_stats.xlsx

If you have any doubt regarding the sheet, please let me know.

Best regards

Carol

carolhsb avatar Dec 11 '23 18:12 carolhsb

Sorry for the late reply as I was quite busy last month. So could you please have a try to run purge_dups for the primary assembly? If the purged primary assembly is as large as expected, it might be the phasing issue of hifiasm itself.

chhylp123 avatar Dec 22 '23 13:12 chhylp123