spades icon indicating copy to clipboard operation
spades copied to clipboard

Stuck at subclustering hamming graph

Open amanpruthi15 opened this issue 3 years ago • 4 comments

I ran spades.py on my reads to have a plant genome assemby which has an estimated size of 600M. I am providng a memory of 998G but it was stuck at subclustering hamming graph for 2 days. I have Nanopore and NovaSeq Illumina reads. Is it normal and if it is do you have any idea about how long it will take for the assembly to finish?

spades.log

Command line: /home/apruthi/SPAdes-3.15.2-Linux/bin/spades.py --pe1-1 /lustre/scratch/apruthi/fastp_DNA/Female_R1.fq --pe1-2 /lustre/scratch/apruthi/fastp_DNA/Female_R2.fq --nanopore /lustre/scratch/joh97948/bryum_filtered.fq -t 256 --memory 998 --careful -o /lustre/scratch/apruthi/spades-again/spades_filtered

System information: SPAdes version: 3.15.2 Python version: 3.7.3 OS: Linux-4.18.0-147.8.1.el8_1.x86_64-x86_64-with-centos-8.1.1911-Core

Output dir: /lustre/scratch/apruthi/spades-again/spades_filtered Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Standard mode For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'. Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/lustre/scratch/apruthi/fastp_DNA/Female_R1.fq'] right reads: ['/lustre/scratch/apruthi/fastp_DNA/Female_R2.fq'] interlaced reads: not specified single reads: not specified merged reads: not specified Library number: 2, library type: nanopore left reads: not specified right reads: not specified interlaced reads: not specified single reads: ['/lustre/scratch/joh97948/bryum_filtered.fq'] merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset will be auto-detected Corrected reads will be compressed Assembly parameters: k: automatic selection based on read length Repeat resolution is enabled Mismatch careful mode is turned ON MismatchCorrector will be used Coverage cutoff is turned OFF Other parameters: Dir for temp files: /lustre/scratch/apruthi/spades-again/spades_filtered/tmp Threads: 256 Memory limit (in Gb): 998

======= SPAdes pipeline started. Log can be found here: /lustre/scratch/apruthi/spades-again/spades_filtered/spades.log

/lustre/scratch/apruthi/fastp_DNA/Female_R1.fq: max reads length: 251 /lustre/scratch/apruthi/fastp_DNA/Female_R2.fq: max reads length: 251

Reads length: 251

Default k-mer sizes were set to [21, 33, 55, 77, 99, 127] because estimated read length (251) is equal to or greater than 250

===== Before start started.

===== Read error correction started.

===== Read error correction started.

== Running: /home/apruthi/SPAdes-3.15.2-Linux/bin/spades-hammer /lustre/scratch/apruthi/spades-again/spades_filtered/corrected/configs/config.info

0:00:00.002 1M / 16M INFO General (main.cpp : 75) Starting BayesHammer, built from refs/heads/spades_3.15.2, git revision aab988a9b4986906b38396da7233bb1ee02982f2 0:00:00.069 1M / 16M INFO General (main.cpp : 76) Loading config from /lustre/scratch/apruthi/spades-again/spades_filtered/corrected/configs/config.info 0:00:00.122 1M / 16M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 128 0:00:00.130 1M / 16M INFO General (memory_limit.cpp : 48) Memory limit set to 998 Gb 0:00:00.141 1M / 16M INFO General (main.cpp : 86) Trying to determine PHRED offset 0:00:00.159 1M / 16M INFO General (main.cpp : 92) Determined value is 33 0:00:00.167 1M / 16M INFO General (hammer_tools.cpp : 38) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ] 0:00:00.178 1M / 16M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes === ITERATION 0 begins === 0:00:00.202 1M / 16M INFO General (kmer_index_builder.hpp : 243) Splitting kmer instances into 16 files using 128 threads. This might take a while. 0:00:00.224 1M / 16M INFO General (file_limit.hpp : 32) Open file limit set to 1024 0:00:00.242 1M / 16M INFO General (kmer_splitter.hpp : 93) Memory available for splitting buffers: 2.59896 Gb 0:00:00.255 1M / 16M INFO General (kmer_splitter.hpp : 101) Using cell size of 4194304 0:00:00.614 73G / 73G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /lustre/scratch/apruthi/fastp_DNA/Female_R1.fq 0:02:33.451 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 17031589 reads 0:04:49.032 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 32173249 reads 0:06:52.813 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 45967624 reads 0:09:01.673 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 60495455 reads 0:11:09.878 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 74815289 reads 0:13:43.696 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 89483665 reads 0:15:31.947 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 100228570 reads 0:17:38.474 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 113872050 reads 0:19:36.018 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 126687412 reads 0:21:42.958 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 140662552 reads 0:24:00.550 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 156044148 reads 0:25:27.623 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 164621232 reads 0:25:27.632 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /lustre/scratch/apruthi/fastp_DNA/Female_R2.fq 0:28:03.042 73G / 112G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 180927788 reads 0:44:18.717 73G / 114G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 279385655 reads 0:52:10.846 73G / 114G INFO K-mer Splitting (kmer_data.cpp : 112) Total 329242464 reads processed 0:52:10.856 1M / 114G INFO General (kmer_index_builder.hpp : 249) Starting k-mer counting. 1:24:22.755 1M / 218G INFO General (kmer_index_builder.hpp : 260) K-mer counting done. There are 12963932488 kmers in total. 1:24:22.782 1M / 218G INFO K-mer Index Building (kmer_index_builder.hpp : 395) Building perfect hash indices 1:32:25.435 9078M / 218G INFO K-mer Index Building (kmer_index_builder.hpp : 431) Index built. Total 12963932488 kmers, 9363397128 bytes occupied (5.77812 bits per kmer). 1:32:25.471 9078M / 218G INFO K-mer Counting (kmer_data.cpp : 354) Arranging kmers in hash map order 1:45:05.030 203G / 218G INFO General (main.cpp : 148) Clustering Hamming graph. 4:17:00.202 203G / 218G INFO General (main.cpp : 155) Extracting clusters: 4:17:00.224 203G / 218G INFO General (concurrent_dsu.cpp : 18) Connecting to root 4:17:35.928 203G / 218G INFO General (concurrent_dsu.cpp : 34) Calculating counts 4:55:32.819 339G / 339G INFO General (concurrent_dsu.cpp : 63) Writing down entries 6:02:26.301 203G / 433G INFO General (main.cpp : 167) Clustering done. Total clusters: 4422489011 6:02:32.068 106G / 433G INFO K-mer Counting (kmer_data.cpp : 371) Collecting K-mer information, this takes a while. 6:03:59.454 396G / 433G INFO K-mer Counting (kmer_data.cpp : 377) Processing /lustre/scratch/apruthi/fastp_DNA/Female_R1.fq 6:32:01.220 396G / 433G INFO K-mer Counting (kmer_data.cpp : 377) Processing /lustre/scratch/apruthi/fastp_DNA/Female_R2.fq 6:52:40.625 396G / 433G INFO K-mer Counting (kmer_data.cpp : 384) Collection done, postprocessing. 6:53:17.040 396G / 433G INFO K-mer Counting (kmer_data.cpp : 398) There are 12963932488 kmers in total. Among them 9826103842 (75.7957%) are singletons. 6:53:17.054 396G / 433G INFO General (main.cpp : 173) Subclustering Hamming graph

amanpruthi15 avatar Apr 01 '21 19:04 amanpruthi15

Hello

This is quite large dataset... :) I will not recommend using BayesHammer on a dataset of such size.

Also, have you performed any quality trimming of your data?

asl avatar Apr 04 '21 07:04 asl

Yes, I have trimmed the short reads using fastp software. ``

amanpruthi15 avatar Apr 04 '21 16:04 amanpruthi15

I experienced the same issue with cds-subgraphs. Memory up to 758GB was not sufficient for extracting subgraphs with 75bp Illumina reads.

marcomeola avatar Apr 16 '21 18:04 marcomeola

@marcomeola Please report this to STRONG authors as this is not part of SPAdes release and not supported by us

asl avatar Apr 16 '21 18:04 asl