spades
spades copied to clipboard
SPAdes scaffolds assembly missing (skipping)
Hi all,
I'm using paired-end assembly on GAGE-b MiSeq dataset. (--pe-1 r1.fastq --p2-2 r2.fastq)
In the log file, I found "Skipping processing of scaffolds (empty file)" and the output therefore missed scaffolds.fasta but there was configs.fasta. Any clue if it is a memory issue or anything else?
I use veresion 3.11.1 and set num_thread to 50 and memory limit to 1024 GB and run on a machine with 96 cores and 1.5TB RAM.
Any help is appreciated!
Hello
Please provide us your spades.log file
This smells like an I/O issue. Does the problem reproduce if you'd simply restart everything from scratch?
Yes, it seems so with the same input and parameters. According to our discussion, could it mean it is almost impossible to produce scaffolds from the assembled contigs? This way, we could simply treat it as a scaffold that contains only a single contig?
According to our discussion, could it mean it is almost impossible to produce scaffolds from the assembled contigs?
Well, the situation with SPAdes is quite the opposite. contigs are generated from scaffolds, so the files are always there and contigs should be equal to scaffolds in this case. So what you're seeing is an indication of real issue.
Will it be possible for your to share your data with us so we could reproduce and fix the issue?
Glad to share!
params.txt spades.log Dataset downloaded from SRA: fastq-dump --split-files SRR522246 Reference genome: https://goo.gl/8X3qCj I use QUAST gage mode as benchmark tools.
Your second spades.log clearly shows that the scaffolds.fasta are there and processed:
== Processing of scaffolds
== Running contig polishing tool: /home/luke831215/tools/SPAdes-3.11.1-Linux/bin/corrector /home/ph/mnt/biocollab/GAGE-B/b-sides/R_sphaeroides_MiSeq/assembly/mismatch_corrector/scaffolds/configs/corrector.info /home/ph/mnt/biocollab/GAGE-B/b-sides/R_sphaeroides_MiSeq/assembly/misc/assembled_scaffolds.fasta
== Dataset description file was created: /home/ph/mnt/biocollab/GAGE-B/b-sides/R_sphaeroides_MiSeq/assembly/mismatch_corrector/scaffolds/configs/corrector.info
Also, the second dataset is different from the first one. We need the one where the issue could be reproduced. We're certainly having the R.sphaeroides dataset from GAGE-B and everything is ok there.
Sorry, this should be right. params.txt spades.log Link for input reads: https://goo.gl/5wevh4
The reads here are the subset (93.4%) of R.sphaeroides MiSeq dataset. Please note that I don't input single reads using --pe1-s here. Adding up the single reads does produce scaffolds.fasta for this subset, but not in all other cases.
Sorry, this should be right.
Doesn't seem so. The log has:
== Processing of scaffolds
== Running contig polishing tool: /home/luke831215/tools/SPAdes-3.11.1-Linux/bin/corrector /home/ph/mnt/biocollab/GAGE-B/b-sides/R_sphaeroides_MiSeq/assembly/mismatch_corrector/scaffolds/configs/corrector.info /home/ph/mnt/biocollab/GAGE-B/b-sides/R_sphaeroides_MiSeq/assembly/misc/assembled_scaffolds.fasta
== Dataset description file was created: /home/ph/mnt/biocollab/GAGE-B/b-sides/R_sphaeroides_MiSeq/assembly/mismatch_corrector/scaffolds/configs/corrector.info
This is what I saw. Please check the new log file uploaded.
Hi Luke Did you resolve the problem? I have the exact problem, but only if i use --careful parameter
Hi guys, I have the same problem like ChQuinteroC I only use --careful
I have installed spades, I am unable to produce the scaffolds while running either with "spades.py --test" nor with spades.py -1 sample_1P.fastq.gz -2 sample_2P.fastq.gz -o spades_output. I am attaching params.txt and spades.log. Kindly help with solution. spades.log params.txt