hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Hifiasm 0.13 does not produce an assembly.

Open haneenih7 opened this issue 3 years ago • 21 comments

I am running different assemblies using different numbers of SMRT cells with hifiasm v0.13 on ccs hifi reads (corrected reads in fastq). The sequences generated from homozygote diploid wheat I have 4 SMRT cells (fastq files with 3 passes), named A, B, C, D I ran successfully assembly with different fastq: B and C A, B, and C A, B, C, and D

However, for some reason that I don't know what I couldn't get an assembly with: The individual SMRT cells : B, C, or D alone A, B, and D B and D C and D

I get a weird long file that the k-mer histogram plot is not correct and does not stop

This an example of the log file from assembling A, B, and D https://drive.google.com/file/d/1HsypTfGrUh0JrVOZh7Yiqm4ZeNVv8z_Q/view?usp=sharing

with this simple command for all cases: time hifiasm -o Monoc_cult_3SMRT -t 32 SMRT_A.CCS3.fastq SMRT_B.CCS3.fastq SMRT_C.CCS3.fastq

I have reviewed this issue https://github.com/chhylp123/hifiasm/issues/39 In my case, it's already hifi reads (corrected reads with 3 passes) so it's not the same problem

haneenih7 avatar Jan 14 '21 21:01 haneenih7

Your reads don't look like HiFi and I believe you have the same problem as #39.

lh3 avatar Jan 14 '21 21:01 lh3

For diploid samples, a typical k-mer histogram likes (https://github.com/chhylp123/hifiasm/issues/10). image I agree with Heng that there is something wrong for your data. You can also run JellyFish to double check the histogram.

chhylp123 avatar Jan 15 '21 04:01 chhylp123

The log file attached has a similar kmer distribution to the one I reported in #62. I don't know what is the common issue between this error and mine, as certainly using three cells should have high enough coverage.

Hopefully the data shared in #62 will be helpful in isolating what causes HiFi data to wrong.

ASLeonard avatar Jan 15 '21 07:01 ASLeonard

I am sure the reads I have are HiFi reads.

Assembly with 3 SMRT cell (A,B, and C) worked perfectly fine. logfile: https://drive.google.com/file/d/1HhqT25YLTUmtCHMIKVqfRDBJ1u_HTN6O/view?usp=sharing

The kmer histogram doesn't look weird http://genomescope.org/analysis.php?code=58sWt4JozZkmuWwOWLIy

haneenih7 avatar Jan 15 '21 07:01 haneenih7

I see. Maybe I was wrong... May I ask what are the coverage of A, B, C and D?

chhylp123 avatar Jan 15 '21 08:01 chhylp123

These are the reads stats:

SMRT cell name | SMRT_A | SMRT_B | SMRT_C | SMRT_D Mean read length | 14,728.9 | 14,773.4 | 14,748 | 14,604 Mean read quality | 30.3 | 30.0 | 30 | 30 Median read length | 14,301.0 | 14,334.0 | 14,313 | 14,080 Median read quality | 29.9 | 29.5 | 29 | 30 Number of reads | 865,846 | 1,593,519 | 1,593,702 | 1,401,297 Read length N50 | 14,912.0 | 14,928.0 | 14,911 | 14,753 Total base | 12,752,980,234.0 | 23,541,635,839.0 | 23,504,209,572 | 20,463,833,363

The estimated coverage for the four SMRT cells is 14.33x (estimated genome size is 5.6 Gbp)

 # of SMRT cell | Total data yield (Gbp) | Estimated coverage |  status 2 SMRT cells (A & B) | 36.34 | 6.5x | sucessful assembly 2 SMRT cells (B & C) | 47.04 | 8.4x | sucessful assembly 3 SMRT cells (A & B & C) | 59.84 | 10.7x | sucessful assembly 3 SMRT cells (A & B & D) | 56.75 | 10.13x | This assembly didn't work 4 SMRT cells (A & B & C & D) | 80.25 | 14.33x | sucessful assembly

haneenih7 avatar Jan 15 '21 08:01 haneenih7

Hi all, I think I am having a similar issue. CCS reads recently provided (output from standard PacBio SMRT Link package with default parameters) look like this:

provided CCS stats: bases 17904868613 reads 1328116 mean 13481.404 median 13395.000 stddev 1544.9318 mode 12041.000 min 45.000 max 46208.000 p90 15582.000 p95 15941.000 p99 16463.000 N50 13722

Yet I'm getting a kmer histogram that looks like both issue #62 and issue #39. I would surprised if my coverage low, given a ~100Mb estimated genome size.

See here to view kmer histogram: slurm-730449.txt

Any help/suggestions highly appreciated.

bishopia avatar Mar 12 '21 17:03 bishopia

Are you using subreads or ccs reads?

Hi all, I think I am having a similar issue. CCS reads recently provided (output from standard PacBio SMRT Link package with default parameters) look like this:

provided CCS stats: bases 17904868613 reads 1328116 mean 13481.404 median 13395.000 stddev 1544.9318 mode 12041.000 min 45.000 max 46208.000 p90 15582.000 p95 15941.000 p99 16463.000 N50 13722

Yet I'm getting a kmer histogram that looks like both issue #62 and issue #39. I would surprised if my coverage low, given a ~100Mb estimated genome size.

See here to view kmer histogram: slurm-730449.txt

Any help/suggestions highly appreciated.

chhylp123 avatar Mar 12 '21 17:03 chhylp123

the core facility gave me both subreads and ccs files (bam and fastq for each). The kmer distribution used ccs.fastq

bishopia avatar Mar 12 '21 18:03 bishopia

It seems you are using 180x data, which shouldn't have histogram issue. Could you please downsample the data to 40x and have a try? Just want to make sure if it is the data problem or histogram issue.

chhylp123 avatar Mar 12 '21 18:03 chhylp123

looks similar to me, see here: slurm-742026.txt

bishopia avatar Mar 12 '21 18:03 bishopia

Could you please have a try with jellyfish? I haven't seen such issue with 40x data. Thanks in advance

chhylp123 avatar Mar 12 '21 20:03 chhylp123

Same thing: aact_ccs_k21_3.jf.histo.txt

Some context: reads come from a single library, single SMRT cell. The library comes from a monoclonal protist culture, which was treated to make axenic, but there is potential for some bacterial presence here in addition to the focal eukaryote. Maybe removing bacterial reads ahead of assembly could help?

Do you have any recommendations for tools I might apply to better assess the initial quality of HiFi reads, either ccs or subreads?

bishopia avatar Mar 12 '21 20:03 bishopia

These don't look like HiFi data, or the coverage is too low to be assembled. K-mer spectrum is the most informative diagnostic tool at early stage. You have to see clear peak(s) for hifiasm to work.

lh3 avatar Mar 13 '21 16:03 lh3

I have a similar issue but I couldn't find any solutions.

wyim-pgl avatar Mar 17 '21 18:03 wyim-pgl

@wyim-pgl @bishopia Is it possible that you can share the data with us? I can have a look at the data to see what happened. But I prefer that there is something wrong in data itself.

chhylp123 avatar Mar 17 '21 19:03 chhylp123

@chhylp123 can you please shoot me an email? [email protected]

wyim-pgl avatar Mar 17 '21 19:03 wyim-pgl

@wyim-pgl My email is [email protected]

chhylp123 avatar Mar 17 '21 19:03 chhylp123

Was this resolved? I am seeing the same problem on a large genome (5 Gbp) but have about 40X reads). Exactly the same strange histogram and hifiasm either times out or runs out of memory.

genomescope histogram: image

Hifiasm histogram: Loading python3/3.9.2 Loading requirement: intel-mkl/2020.3.304 [M::ha_analyze_count] lowest: count[220] = 166912 [M::ha_analyze_count] highest: count[4095] = 895991 [M::ha_hist_line] 2: ****************************************************************************************************> 438692277 [M::ha_hist_line] 3: ****************************************************************************************************> 280097653 [M::ha_hist_line] 4: ****************************************************************************************************> 223910379 [M::ha_hist_line] 5: ****************************************************************************************************> 195730012 [M::ha_hist_line] 6: ****************************************************************************************************> 177481861 [M::ha_hist_line] 7: ****************************************************************************************************> 162255411 [M::ha_hist_line] 8: ****************************************************************************************************> 148846139 [M::ha_hist_line] 9: ****************************************************************************************************> 137754120 [M::ha_hist_line] 10: ****************************************************************************************************> 128498598 [M::ha_hist_line] 11: ****************************************************************************************************> 120872996 [M::ha_hist_line] 12: ****************************************************************************************************> 113952370 [M::ha_hist_line] 13: ****************************************************************************************************> 107186515 [M::ha_hist_line] 14: ****************************************************************************************************> 100647668 [M::ha_hist_line] 15: ****************************************************************************************************> 94440053 [M::ha_hist_line] 16: ****************************************************************************************************> 87992009 [M::ha_hist_line] 17: ****************************************************************************************************> 82246721 [M::ha_hist_line] 18: ****************************************************************************************************> 77358631 [M::ha_hist_line] 19: ****************************************************************************************************> 72702671 [M::ha_hist_line] 20: ****************************************************************************************************> 68371575 [M::ha_hist_line] 21: ****************************************************************************************************> 64433906 [M::ha_hist_line] 22: ****************************************************************************************************> 61175161 [M::ha_hist_line] 23: ****************************************************************************************************> 57948879 [M::ha_hist_line] 24: ****************************************************************************************************> 55089129 [M::ha_hist_line] 25: ****************************************************************************************************> 52627377 [M::ha_hist_line] 26: ****************************************************************************************************> 50349724 [M::ha_hist_line] 27: ****************************************************************************************************> 48297716 [M::ha_hist_line] 28: ****************************************************************************************************> 46285916 [M::ha_hist_line] 29: ****************************************************************************************************> 44506462 [M::ha_hist_line] 30: ****************************************************************************************************> 42859404 [M::ha_hist_line] 31: ****************************************************************************************************> 41518316 [M::ha_hist_line] 32: ****************************************************************************************************> 40221319 [M::ha_hist_line] 33: ****************************************************************************************************> 38966359 [M::ha_hist_line] 34: ****************************************************************************************************> 37722519 [M::ha_hist_line] 35: ****************************************************************************************************> 36445864 [M::ha_hist_line] 36: ****************************************************************************************************> 35285009 [M::ha_hist_line] 37: ****************************************************************************************************> 34157692 [M::ha_hist_line] 38: ****************************************************************************************************> 33030603 [M::ha_hist_line] 39: ****************************************************************************************************> 31923721 [M::ha_hist_line] 40: ****************************************************************************************************> 30705974 [M::ha_hist_line] 41: ****************************************************************************************************> 29505253 [M::ha_hist_line] 42: ****************************************************************************************************> 28186769 [M::ha_hist_line] 43: ****************************************************************************************************> 26856972 [M::ha_hist_line] 44: ****************************************************************************************************> 25548471 [M::ha_hist_line] 45: ****************************************************************************************************> 24212795 [M::ha_hist_line] 46: ****************************************************************************************************> 22721721 [M::ha_hist_line] 47: ****************************************************************************************************> 21212232 [M::ha_hist_line] 48: ****************************************************************************************************> 19766194 [M::ha_hist_line] 49: ****************************************************************************************************> 18250220 [M::ha_hist_line] 50: ****************************************************************************************************> 16793140 [M::ha_hist_line] 51: ****************************************************************************************************> 15389106 [M::ha_hist_line] 52: ****************************************************************************************************> 14050824 [M::ha_hist_line] 53: ****************************************************************************************************> 12767362 [M::ha_hist_line] 54: ****************************************************************************************************> 11553163 [M::ha_hist_line] 55: ****************************************************************************************************> 10446529 [M::ha_hist_line] 56: ****************************************************************************************************> 9420560 [M::ha_hist_line] 57: ****************************************************************************************************> 8518171 [M::ha_hist_line] 58: ****************************************************************************************************> 7696638 [M::ha_hist_line] 59: ****************************************************************************************************> 6917150 [M::ha_hist_line] 60: ****************************************************************************************************> 6242561 [M::ha_hist_line] 61: ****************************************************************************************************> 5679388 [M::ha_hist_line] 62: ****************************************************************************************************> 5139779 [M::ha_hist_line] 63: ****************************************************************************************************> 4704252 [M::ha_hist_line] 64: ****************************************************************************************************> 4326278 [M::ha_hist_line] 65: ****************************************************************************************************> 4003112 [M::ha_hist_line] 66: ****************************************************************************************************> 3737438 [M::ha_hist_line] 67: ****************************************************************************************************> 3496212 [M::ha_hist_line] 68: ****************************************************************************************************> 3298497 [M::ha_hist_line] 69: ****************************************************************************************************> 3126918 [M::ha_hist_line] 70: ****************************************************************************************************> 2990873 [M::ha_hist_line] 71: ****************************************************************************************************> 2849454 [M::ha_hist_line] 72: ****************************************************************************************************> 2737231 [M::ha_hist_line] 73: ****************************************************************************************************> 2626637 [M::ha_hist_line] 74: ****************************************************************************************************> 2535613 [M::ha_hist_line] 75: ****************************************************************************************************> 2441147 [M::ha_hist_line] 76: ****************************************************************************************************> 2348521 [M::ha_hist_line] 77: ****************************************************************************************************> 2279318 [M::ha_hist_line] 78: ****************************************************************************************************> 2210567 [M::ha_hist_line] 79: ****************************************************************************************************> 2151784 [M::ha_hist_line] 80: ****************************************************************************************************> 2082913 [M::ha_hist_line] 81: ****************************************************************************************************> 2032767 [M::ha_hist_line] 82: ****************************************************************************************************> 1987855 [M::ha_hist_line] 83: ****************************************************************************************************> 1928743 [M::ha_hist_line] 84: ****************************************************************************************************> 1885365 [M::ha_hist_line] 85: ****************************************************************************************************> 1836064 [M::ha_hist_line] 86: ****************************************************************************************************> 1792907 [M::ha_hist_line] 87: ****************************************************************************************************> 1744233 [M::ha_hist_line] 88: ****************************************************************************************************> 1699604 [M::ha_hist_line] 89: ****************************************************************************************************> 1660911 [M::ha_hist_line] 90: ****************************************************************************************************> 1633022 [M::ha_hist_line] 91: ****************************************************************************************************> 1572865 [M::ha_hist_line] 92: ****************************************************************************************************> 1539841 [M::ha_hist_line] 93: ****************************************************************************************************> 1507847 [M::ha_hist_line] 94: ****************************************************************************************************> 1465808 [M::ha_hist_line] 95: ****************************************************************************************************> 1429893 [M::ha_hist_line] 96: ****************************************************************************************************> 1389538 [M::ha_hist_line] 97: ****************************************************************************************************> 1354647 [M::ha_hist_line] 98: ****************************************************************************************************> 1317308 [M::ha_hist_line] 99: ****************************************************************************************************> 1291713 [M::ha_hist_line] 100: ****************************************************************************************************> 1248287 [M::ha_hist_line] 101: ****************************************************************************************************> 1214370 [M::ha_hist_line] 102: ****************************************************************************************************> 1189425 [M::ha_hist_line] 103: ****************************************************************************************************> 1154052 [M::ha_hist_line] 104: ****************************************************************************************************> 1131994 [M::ha_hist_line] 105: ****************************************************************************************************> 1103067 [M::ha_hist_line] 106: ****************************************************************************************************> 1074676 [M::ha_hist_line] 107: ****************************************************************************************************> 1047204 [M::ha_hist_line] 108: ****************************************************************************************************> 1023431 [M::ha_hist_line] 109: ****************************************************************************************************> 997629 [M::ha_hist_line] 110: ****************************************************************************************************> 970476 [M::ha_hist_line] 111: ****************************************************************************************************> 942592 [M::ha_hist_line] 112: ****************************************************************************************************> 915990 [M::ha_hist_line] 113: **************************************************************************************************** 897083 [M::ha_hist_line] 114: ************************************************************************************************** 882092 [M::ha_hist_line] 115: ************************************************************************************************ 859166 [M::ha_hist_line] 116: ********************************************************************************************** 839271 [M::ha_hist_line] 117: ******************************************************************************************** 822329 [M::ha_hist_line] 118: ***************************************************************************************** 799427 [M::ha_hist_line] 119: **************************************************************************************** 788709 [M::ha_hist_line] 120: ************************************************************************************** 769102 [M::ha_hist_line] 121: ************************************************************************************ 753295 [M::ha_hist_line] 122: ********************************************************************************** 734519 [M::ha_hist_line] 123: ******************************************************************************** 719822 [M::ha_hist_line] 124: ****************************************************************************** 700076 [M::ha_hist_line] 125: ***************************************************************************** 687499 [M::ha_hist_line] 126: *************************************************************************** 672285 [M::ha_hist_line] 127: ************************************************************************** 659352 [M::ha_hist_line] 128: *********************************************************************** 639368 [M::ha_hist_line] 129: ********************************************************************** 631226 [M::ha_hist_line] 130: ********************************************************************* 620097 etc. ..... [M::ha_hist_line] 1531: * 131 [M::ha_hist_line] 1532: * 159 [M::ha_hist_line] 1533: * 123 [M::ha_hist_line] 1534: * 141 [M::ha_hist_line] rest: **************************************************************************************************> 116071 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: none [M::ha_pt_gen] peak_hom: 146; peak_het: -1 [M::ha_ct_shrink::2782.11910.48] ==> counted 186238394 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 4558663265 minimizers [M::ha_pt_gen::3496.31112.40] ==> indexed 4469048672 positions, counted 186238394 distinct minimizer k-mers =>> PBS: job killed: walltime 172892 exceeded limit 172800

tallnuttrbgv avatar Apr 15 '24 02:04 tallnuttrbgv

Hello @tallnuttrbgv At the end the issue I was having was because the reads obtained from the fourth SMRTcell were contaminated Basically the company mixed up two different samples (wild vs domesticated plants that are very closely related) and Hifiasm was struggling.

haneenih7 avatar Apr 15 '24 13:04 haneenih7

I have now discovered something similar - one library that has a very large excess of 5S reads. It seems this problem often arises from mixtures or overrepresentation.

tallnuttrbgv avatar Apr 16 '24 00:04 tallnuttrbgv