spades icon indicating copy to clipboard operation
spades copied to clipboard

Warnings using metaspades

Open mariadelmarq opened this issue 4 years ago • 5 comments

I'm trying to assemble some publicly available metagenomic data using metaspades. The data is here: https://www.omicsdi.org/dataset/omics_ena_project/PRJNA379494. I'm testing the first set of paired-end reads: SRR5351712_1 and SRR5351712_2.

I was able to assemble them using other assemblers with no issues, but when I run metaspades.py on them (metaspades.py -1 SRR5351712_1.fastq.gz -2 SRR5351712_2.fastq.gz), I get a series of warnings that suggest the paired-end reads are corrupted:

======= SPAdes pipeline finished WITH WARNINGS!

=== Error correction and assembling warnings:
 * 0:00:18.120   247M / 919M  WARN    General                 (pair_info_count.cpp       : 341)   Unable to estimate insert size for paired library #0
 * 0:00:18.121   247M / 919M  WARN    General                 (pair_info_count.cpp       : 347)   None of paired reads aligned properly. Please, check orientation of your read pairs.
 * 0:00:18.122   247M / 919M  WARN    General                 (repeat_resolving.cpp      :  63)   Insert size was not estimated for any of the paired libraries, repeat resolution module will not run.
 * 0:00:27.275   235M / 919M  WARN    General                 (pair_info_count.cpp       : 175)   Single reads are not used in metagenomic mode
=======

I then tried with regular spades (spades.py -1 SRR5351712_1.fastq.gz -2 SRR5351712_2.fastq.gz), and get a different warning:

=== Error correction and assembling warnings:
 * 0:00:03.999   297M / 505M  WARN    General                 (kmer_coverage_model.cpp   : 218)   Too many erroneous kmers, the estimates might be unreliable
=======

Is it that the files are corrupted in a way that megahit, for example, is unable to pick up on, or is there a compatibility issue between these files and spades? I've tried both the raw files and trimmed files (using trimmomatic), same warnings in both cases.

Here are the log and param files for the metaspades assembly, let me know if you'd like me to send through the spades ones as well. params.txt spades.log

Thanks!

mariadelmarq avatar Aug 17 '20 00:08 mariadelmarq

Hello

It does not look like a metagenomic dataset: it is very small (both in terms of # of reads and the genome size), however the average coverage is very large. So, I would suspect there is something wrong with this dataset.

asl avatar Aug 17 '20 07:08 asl

Thanks, @asl! Do you happen to know why metaspades picks up something weird in terms of the paired-end reads, whereas spades doesn't agree?

Weirdly enough, this dataset certainly claims to be metagenome data (https://www.ebi.ac.uk/ena/browser/view/PRJNA379494) and it forms the basis for a publication in Scientific Reports: https://www.nature.com/articles/s41598-017-06404-8.

mariadelmarq avatar Aug 18 '20 00:08 mariadelmarq

Hi @mariadelmarq ! did you manage to solve your issue? I am facing the same situation. any ideas ? image

kmkappa avatar Mar 21 '23 22:03 kmkappa

@kmkappa Please do not hijack unrelated issues, open a new one

asl avatar Mar 21 '23 22:03 asl

@asl as you prefer. please find the same problem occurred on my machine under #1110 issue

kmkappa avatar Mar 21 '23 23:03 kmkappa