spades icon indicating copy to clipboard operation
spades copied to clipboard

spades correction results in ill-formatted reads

Open rec3141 opened this issue 4 years ago • 5 comments

Hello, I'm trying to assemble some metagenomes downloaded from EBI, and running into issues with SPAdes outputting fastq reads where the quality line is not the same length as the sequence line. This leads to SPAdes failing with the following error:

  0:20:05.202   792M / 792M  ERROR   General                 (paired_readers.hpp        :  56)   The number of right read-pairs is larger than the number of left read-pairs
  0:20:05.202   792M / 792M  ERROR   General                 (paired_readers.hpp        :  60)   Unequal number of read-pairs detected in the following files: /import/c1/NANOBASE/recollins/metta/assembly/spades-scratch/spades_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_2020-03-04-20-07-40/corrected/trimmed_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_S000_L001_R1_001.fastq.00.0_0.cor.fastq.gz  /import/c1/NANOBASE/recollins/metta/assembly/spades-scratch/spades_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_2020-03-04-20-07-40/corrected/trimmed_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_S000_L001_R2_001.fastq.00.0_0.cor.fastq.gz


== Error ==  system call for: "['/home/recollins/apps/SPAdes-3.13.0-Linux/bin/spades-core', '/import/c1/NANOBASE/recollins/metta/assembly/spades-scratch/spades_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_2020-03-04-20-07-40/K21/configs/config.info']" finished abnormally, err code: 255

reads: ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3589564/BMI_AADIOSF_3_1_C7C8WACXX.IND34_clean.fastq.gz ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3589564/BMI_AADIOSF_3_2_C7C8WACXX.IND34_clean.fastq.gz

raw FASTQ read:

@H4:C7C8WACXX:3:2207:3174:68099/1
AAAAAAAAATCTAAACGCTAATGCTGAAAAAGNATCACTATTATCTATTATTGGTTTTGTGGTAACAAACGCCGATGACCACAAGATAATAAAAATAAATG
+
@@@DF@FFFHAHHJIHIHAFGIIJICAGCHGG#-7BFGB@GGIIIEIBEEEHHH?;@;@.>A;@CDCD@A?/=9@>B@CCCA1<BCCCDCACC(:<CCDEC

bbduk filtered read

@H4:C7C8WACXX:3:2207:3174:68099/1
AAAAAAAAATCTAAACGCTAATGCTGAAAAAGNATCACTATTATCTATTATTGGTTTTGTGGTAACAAACGCCGATGACCACAAGATAATAAAAATAAATG
+
@@@DF@FFFHAHHJIHIHAFGIIJICAGCHGG!-7BFGB@GGIIIEIBEEEHHH?;@;@.>A;@CDCD@A?/=9@>B@CCCA1<BCCCDCACC(:<CCDEC

SPAdes-3.13.0-Linux corrected read

@H4:C7C8WACXX:3:2207:3174:68099/1 BH:changed:3
AAAAAAAAATCTAAACGCTAATGCTGAAAAAGGATCACTATTATCTATTATTGGTTTTGTTGTAACAAAAGCCGATGACCACAAGATAATAAAAATAAATG
+H4:C7C8WACXX:3:2207:3174:68099/1 BH:changed:3
@@@DDDDDDDDDDDDCDD

I should mention this is not an end-of-file issue, the total number of reads is equal using wc -l

rec3141 avatar Mar 10 '20 18:03 rec3141

Hello

Will it be possible to upload your spades.log file?

asl avatar Mar 10 '20 20:03 asl

spades.log

the actual spades.log was overwritten but this is the stdout log

rec3141 avatar Mar 10 '20 20:03 rec3141

I'm running it now with 14.0 to see if it changes

rec3141 avatar Mar 10 '20 20:03 rec3141

Looks like one of the files got truncated, probably during the gzip compression – the number of reads written by BayesHammer and the number of reads received by SPAdes differ.

asl avatar Mar 10 '20 21:03 asl

I did wc -l on the spades-corrected files and got the same number

rec3141 avatar Mar 10 '20 21:03 rec3141