spades
spades copied to clipboard
Warnings using metaspades
I'm trying to assemble some publicly available metagenomic data using metaspades. The data is here: https://www.omicsdi.org/dataset/omics_ena_project/PRJNA379494. I'm testing the first set of paired-end reads: SRR5351712_1 and SRR5351712_2.
I was able to assemble them using other assemblers with no issues, but when I run metaspades.py on them (metaspades.py -1 SRR5351712_1.fastq.gz -2 SRR5351712_2.fastq.gz
), I get a series of warnings that suggest the paired-end reads are corrupted:
======= SPAdes pipeline finished WITH WARNINGS!
=== Error correction and assembling warnings:
* 0:00:18.120 247M / 919M WARN General (pair_info_count.cpp : 341) Unable to estimate insert size for paired library #0
* 0:00:18.121 247M / 919M WARN General (pair_info_count.cpp : 347) None of paired reads aligned properly. Please, check orientation of your read pairs.
* 0:00:18.122 247M / 919M WARN General (repeat_resolving.cpp : 63) Insert size was not estimated for any of the paired libraries, repeat resolution module will not run.
* 0:00:27.275 235M / 919M WARN General (pair_info_count.cpp : 175) Single reads are not used in metagenomic mode
=======
I then tried with regular spades (spades.py -1 SRR5351712_1.fastq.gz -2 SRR5351712_2.fastq.gz
), and get a different warning:
=== Error correction and assembling warnings:
* 0:00:03.999 297M / 505M WARN General (kmer_coverage_model.cpp : 218) Too many erroneous kmers, the estimates might be unreliable
=======
Is it that the files are corrupted in a way that megahit, for example, is unable to pick up on, or is there a compatibility issue between these files and spades? I've tried both the raw files and trimmed files (using trimmomatic), same warnings in both cases.
Here are the log and param files for the metaspades assembly, let me know if you'd like me to send through the spades ones as well. params.txt spades.log
Thanks!
Hello
It does not look like a metagenomic dataset: it is very small (both in terms of # of reads and the genome size), however the average coverage is very large. So, I would suspect there is something wrong with this dataset.
Thanks, @asl! Do you happen to know why metaspades picks up something weird in terms of the paired-end reads, whereas spades doesn't agree?
Weirdly enough, this dataset certainly claims to be metagenome data (https://www.ebi.ac.uk/ena/browser/view/PRJNA379494) and it forms the basis for a publication in Scientific Reports: https://www.nature.com/articles/s41598-017-06404-8.
Hi @mariadelmarq !
did you manage to solve your issue?
I am facing the same situation. any ideas ?
@kmkappa Please do not hijack unrelated issues, open a new one
@asl as you prefer. please find the same problem occurred on my machine under #1110 issue