GetOrganelle icon indicating copy to clipboard operation
GetOrganelle copied to clipboard

Error with running SPAdes: you should at least specify one file with reads

Open mxHuber opened this issue 1 year ago • 2 comments

Hello,

i'm currently having problems with running GetOrganelle. It works for some species, but not for others. The error i get is: "Error with running SPAdes: == Error == you should specify at least one file with reads!" I've included the full log file at the bottom. The read files i used should be fine, as i have used them for other things before and had no issues.

The test with the simulated Arabidopsis thaliana worked just fine. It also worked for Fregetta grallaria (white-bellied storm-petrel). It did not work for Corvus cornix (hooded crow). I used the same command for both bird species, with only the read files and output directories changed.

The command i used: /vol/storage/GetOrganelle-1.7.6.1/get_organelle_from_reads.py -1 SRR1271631_1.fastq.gz -2 SRR1271631_2.fastq.gz -t 1 -o hooded_crow_GetOrganelle -F animal_mt -R 10 --memory-save -w 195 --out-per-round --which-spades /vol/storage/SPAdes-3.15.4-Linux/bin/

These are the read files i used for the two species mentioned above. I used fastq-dump to convert them to the fastq format and then zipped them. Fregetta_grallaria https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR9946910/SRR9946910

Corvus_cornix https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR1271631/SRR1271631

get_org.log.txt

GetOrganelle v1.7.6.1

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] PLATFORM: Linux maxswarm-f40e3 4.15.0-177-generic #186-Ubuntu SMP Thu Apr 14 20:23:07 UTC 2022 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.6.1; numpy 1.21.5; sympy 1.10.1; scipy 1.7.3; psutil 5.8.0 DEPENDENCIES: Bowtie2 2.3.4.1; SPAdes 3.15.4; Blast 2.6.0 GETORG_PATH=/home/ubuntu/.GetOrganelle SEED DB: animal_mt 0.0.1 LABEL DB: animal_mt 0.0.1 WORKING DIR: /vol/storage/prmd/secondTry /vol/storage/GetOrganelle-1.7.6.1/get_organelle_from_reads.py -1 SRR1271631_1.fastq.gz -2 SRR1271631_2.fastq.gz -t 1 -o hooded_crow_GetOrganelle -F animal_mt -R 10 --memory-save -w 195 --out-per-round --which-spades /vol/storage/SPAdes-3.15.4-Linux/bin/

2022-07-17 21:44:27,019 - INFO: Pre-reading fastq ... 2022-07-17 21:44:27,019 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-07-17 21:44:27,088 - INFO: Tasting 100000+100000 reads ... 2022-07-17 21:44:40,312 - INFO: Tasting 500000+500000 reads ... 2022-07-17 21:45:12,552 - INFO: Tasting 2500000+2500000 reads ... 2022-07-17 21:47:40,831 - INFO: Tasting 12500000+12500000 reads ... 2022-07-17 22:00:06,284 - INFO: Estimating reads to use finished. 2022-07-17 22:00:06,284 - INFO: Unzipping reads file: SRR1271631_1.fastq.gz (11209964888 bytes) 2022-07-17 22:07:31,089 - INFO: Unzipping reads file: SRR1271631_2.fastq.gz (11541606652 bytes) 2022-07-17 22:15:13,670 - INFO: Counting read qualities ... 2022-07-17 22:15:14,031 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-07-17 22:15:14,031 - INFO: Phred offset = 33 2022-07-17 22:15:14,032 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-07-17 22:15:14,100 - INFO: Mean error rate = 0.0149 2022-07-17 22:15:14,100 - INFO: Counting read lengths ... 2022-07-17 22:58:09,559 - INFO: Mean = 100.0 bp, maximum = 100 bp. 2022-07-17 22:58:09,569 - INFO: Reads used = 117859715+117859715 2022-07-17 22:58:09,569 - INFO: Pre-reading fastq finished.

2022-07-17 22:58:09,569 - INFO: Making seed reads ... 2022-07-17 22:58:09,569 - INFO: Seed bowtie2 index existed! 2022-07-17 22:58:09,569 - INFO: Mapping reads to seed bowtie2 index ... 2022-07-18 00:24:42,661 - INFO: Mapping finished. 2022-07-18 00:24:42,672 - INFO: Seed reads made: hooded_crow_GetOrganelle/seed/animal_mt.initial.fq (1831918 bytes) 2022-07-18 00:24:42,672 - INFO: Making seed reads finished.

2022-07-18 00:24:42,672 - INFO: Checking seed reads and parameters ... 2022-07-18 00:24:46,617 - INFO: Estimated animal_mt-hitting base-coverage = 55.37 2022-07-18 00:24:49,346 - INFO: Setting '--max-extending-len inf' 2022-07-18 00:24:49,424 - INFO: Checking seed reads and parameters finished.

2022-07-18 00:24:49,425 - INFO: Making read index ... 2022-07-18 01:04:23,864 - INFO: Mem 0.36 G, 0 reads 2022-07-18 01:04:23,868 - INFO: Making read index finished.

2022-07-18 01:04:23,868 - INFO: Extending ... 2022-07-18 01:04:23,868 - INFO: Adding initial words ... 2022-07-18 01:04:24,441 - INFO: AW 0 2022-07-18 01:44:57,752 - INFO: Round 1: 1/0 AI 0 AW 0 Mem 0.36 2022-07-18 01:44:57,791 - INFO: No more reads found and terminated ... 2022-07-18 02:27:11,197 - INFO: Extending finished.

2022-07-18 02:27:17,663 - INFO: Separating extended fastq file ... 2022-07-18 02:27:17,663 - WARNING: No paired reads found?! 2022-07-18 02:27:17,663 - INFO: Setting '-k 21,55,85' 2022-07-18 02:27:17,663 - INFO: Assembling using SPAdes ... 2022-07-18 02:27:17,686 - WARNING: Compression after read correction will be skipped for lack of 'pigz' 2022-07-18 02:27:17,686 - INFO: /vol/storage/SPAdes-3.15.4-Linux/bin/spades.py -t 1 --disable-gzip-output --phred-offset 33 -k 21,55,85 -o hooded_crow_GetOrganelle/extended_spades 2022-07-18 02:27:19,636 - ERROR: Error with running SPAdes: == Error == you should specify at least one file with reads! 2022-07-18 02:27:19,661 - ERROR: Assembling failed.

Total cost 16974.29 s Thank you!

I haven't found information about something like this on the wiki or in a discussion thread, so i'm unsure what i am doing wrong. It also confuses me that it sometimes works and sometimes doesn't. Is there something i missed?

Kind Regards Max

mxHuber avatar Jul 18 '22 20:07 mxHuber

Hi Max,

It is because you used an arbitrary word size (195) that is even larger than the read length (100). I should have forbidden this after read length checking - I will leave this issue open until I add the code to the next release.

We always recommend using the default automatically word size first, then making adjustments to word size and other parameters (see here) if the first run fails. You just need to remove -w 195 from your command to make all samples run.

JianjunJin avatar Jul 18 '22 20:07 JianjunJin

Ah that makes alot of sense. Thank you!

mxHuber avatar Jul 18 '22 22:07 mxHuber