GetOrganelle
GetOrganelle copied to clipboard
(ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index
Hi,
I have installed using a manual method - git clone etc. But it seems to fail on test data building the bowtie index.
Thanks,
GetOrganelle v1.7.5.3
get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-03.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
2022-02-22 15:37:02,907 - INFO: Pre-reading fastq ... 2022-02-22 15:37:02,907 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-22 15:37:03,028 - INFO: Estimating reads to use finished. 2022-02-22 15:37:03,029 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-22 15:37:07,535 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-22 15:37:12,807 - INFO: Counting read qualities ... 2022-02-22 15:37:12,959 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-22 15:37:12,959 - INFO: Phred offset = 33 2022-02-22 15:37:12,960 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-22 15:37:13,012 - INFO: Mean error rate = 0.0019 2022-02-22 15:37:13,013 - INFO: Counting read lengths ... 2022-02-22 15:37:13,181 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-22 15:37:13,182 - INFO: Reads used = 91563+91563 2022-02-22 15:37:13,182 - INFO: Pre-reading fastq finished.
2022-02-22 15:37:13,182 - INFO: Making seed reads ... 2022-02-22 15:37:18,147 - INFO: Making seed - bowtie2 index ... 2022-02-22 15:37:18,212 - INFO: Making seed - bowtie2 index finished. 2022-02-22 15:37:18,213 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-22 15:37:18,316 - ERROR: (ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index Exiting now ...
2022-02-22 15:37:18,316 - ERROR: Traceback (most recent call last): File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3941, in main seed_fq, seed_sam, new_seed_f = making_seed_reads_using_mapping( File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3040, in making_seed_reads_using_mapping map_with_bowtie2(seed_file=seed_file, original_fq_files=original_fq_files, File "/g/data/nm31/bin/GetOrganelle/GetOrganelleLib/pipe_control_func.py", line 399, in map_with_bowtie2 raise Exception("") Exception
Total cost 22.27 s For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the assembly graph (can be *.png to protect your data privacy) if possible!
Hi,
Sorry to interrupt and add a few problems in this thread. You might want to download the database first. Put it in your main directory https://github.com/Kinggerm/GetOrganelle/wiki/Initialization
I had the same trouble but succeed after downloading it.
However, I notice the same error I made with your log. If you see the dependencies, it's deprecated. I hope the authors could help us fix this problem.
See my log file. Something wrong with SPAdes.
Thank you.
jaktykusuma
Your error was different from the current thread. The deprecated dependency issue is currently a harmless warning, not an error.
The failure of running SPAdes in your case was caused by the space in your working directory, specifically, "IRD Works". Besides, please to 1.7.5+, which not only has better instant feedback info in the space-in-working-directory case but also has essential bugs fixed.
Hi,
I have installed using a manual method - git clone etc. But it seems to fail on test data building the bowtie index.
Thanks,
GetOrganelle v1.7.5.3
get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-03.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
2022-02-22 15:37:02,907 - INFO: Pre-reading fastq ... 2022-02-22 15:37:02,907 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-22 15:37:03,028 - INFO: Estimating reads to use finished. 2022-02-22 15:37:03,029 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-22 15:37:07,535 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-22 15:37:12,807 - INFO: Counting read qualities ... 2022-02-22 15:37:12,959 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-22 15:37:12,959 - INFO: Phred offset = 33 2022-02-22 15:37:12,960 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-22 15:37:13,012 - INFO: Mean error rate = 0.0019 2022-02-22 15:37:13,013 - INFO: Counting read lengths ... 2022-02-22 15:37:13,181 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-22 15:37:13,182 - INFO: Reads used = 91563+91563 2022-02-22 15:37:13,182 - INFO: Pre-reading fastq finished.
2022-02-22 15:37:13,182 - INFO: Making seed reads ... 2022-02-22 15:37:18,147 - INFO: Making seed - bowtie2 index ... 2022-02-22 15:37:18,212 - INFO: Making seed - bowtie2 index finished. 2022-02-22 15:37:18,213 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-22 15:37:18,316 - ERROR: (ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index Exiting now ...
2022-02-22 15:37:18,316 - ERROR: Traceback (most recent call last): File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3941, in main seed_fq, seed_sam, new_seed_f = making_seed_reads_using_mapping( File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3040, in making_seed_reads_using_mapping map_with_bowtie2(seed_file=seed_file, original_fq_files=original_fq_files, File "/g/data/nm31/bin/GetOrganelle/GetOrganelleLib/pipe_control_func.py", line 399, in map_with_bowtie2 raise Exception("") Exception
Total cost 22.27 s For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the assembly graph (can be *.png to protect your data privacy) if possible!
Could you please
- run
ls -lah Arabidopsis_simulated.plastome/seed/embplant_pt.index
to list the files - run
bowtie2-build -h
to see the reaction. - rerun the command with "--verbose" added and attach the new log file here
Thanks!
ls -lah Arabidopsis_simulated.plastome/seed/embplant_pt.index ls: cannot access 'Arabidopsis_simulated.plastome/seed/embplant_pt.index': No such file or directory
ls -lah Arabidopsis_simulated.plastome/seed/ total 15M drwxr-sr-x 2 ta0341 nm31 33K Feb 22 15:44 . drwxr-sr-x 3 ta0341 nm31 33K Feb 22 15:44 .. -rw-r--r-- 1 ta0341 nm31 15M Feb 22 15:44 embplant_pt.fasta
bowtie2-build -h === ERROR === The use of the #!/usr/bin/env python interpreter line in python scripts has been deprecated.
Please modify this script: /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/bowtie2/bowtie2-build
To use either #!/usr/bin/env python3 or #!/usr/bin/env python2 depending on which version of python you require Alternatively, if you are unable to modify this script You can load the python2-as-python or python3-as-python modules depending on which version of python you require
I fixed the interpreter line in bowtie2-build then got the error below. Verbose log also attached
get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
GetOrganelle v1.7.5.3
get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-06.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
2022-02-23 10:50:16,977 - INFO: Pre-reading fastq ... 2022-02-23 10:50:16,977 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-23 10:50:17,177 - INFO: Estimating reads to use finished. 2022-02-23 10:50:17,177 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-23 10:50:17,513 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-23 10:50:20,697 - INFO: Counting read qualities ... 2022-02-23 10:50:20,851 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-23 10:50:20,851 - INFO: Phred offset = 33 2022-02-23 10:50:20,852 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-23 10:50:20,901 - INFO: Mean error rate = 0.0019 2022-02-23 10:50:20,902 - INFO: Counting read lengths ... 2022-02-23 10:50:21,068 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-23 10:50:21,068 - INFO: Reads used = 91563+91563 2022-02-23 10:50:21,068 - INFO: Pre-reading fastq finished.
2022-02-23 10:50:21,068 - INFO: Making seed reads ... 2022-02-23 10:50:24,278 - INFO: Making seed - bowtie2 index ... 2022-02-23 10:50:33,840 - INFO: Making seed - bowtie2 index finished. 2022-02-23 10:50:33,840 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-23 10:50:42,532 - INFO: Mapping finished. 2022-02-23 10:50:42,534 - INFO: Seed reads made: Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq (14144302 bytes) 2022-02-23 10:50:42,535 - INFO: Making seed reads finished.
2022-02-23 10:50:42,535 - INFO: Checking seed reads and parameters ... 2022-02-23 10:50:42,535 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2022-02-23 10:50:42,535 - INFO: If the result graph is not a circular organelle genome, 2022-02-23 10:50:42,535 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2022-02-23 10:50:45,524 - INFO: Pre-assembling mapped reads ... 2022-02-23 10:50:47,545 - INFO: Retrying with more reads .. 2022-02-23 10:51:06,399 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading. 2022-02-23 10:51:07,664 - INFO: Estimated embplant_pt-hitting base-coverage = 52.85 2022-02-23 10:51:07,876 - INFO: Estimated word size(s): 98 2022-02-23 10:51:07,877 - INFO: Setting '-w 98' 2022-02-23 10:51:07,877 - INFO: Setting '--max-extending-len inf' 2022-02-23 10:51:07,958 - INFO: Checking seed reads and parameters finished.
2022-02-23 10:51:07,958 - INFO: Making read index ... 2022-02-23 10:51:09,003 - INFO: Mem 0.324 G, 178623 candidates in all 183126 reads 2022-02-23 10:51:09,003 - INFO: Pre-grouping reads ... 2022-02-23 10:51:09,004 - INFO: Setting '--pre-w 98' 2022-02-23 10:51:09,030 - INFO: Mem 0.324 G, 4074/4074 used/duplicated 2022-02-23 10:51:09,287 - INFO: Mem 0.324 G, 517 groups made. 2022-02-23 10:51:09,298 - INFO: Making read index finished.
2022-02-23 10:51:09,298 - INFO: Extending ... 2022-02-23 10:51:09,298 - INFO: Adding initial words ... 2022-02-23 10:51:10,821 - INFO: AW 1113742 2022-02-23 10:51:12,411 - INFO: Round 1: 178623/178623 AI 40378 AW 1126044 Mem 0.437 2022-02-23 10:51:13,216 - INFO: Round 2: 178623/178623 AI 40411 AW 1126346 Mem 0.437 2022-02-23 10:51:14,071 - INFO: Round 3: 178623/178623 AI 40411 AW 1126346 Mem 0.437 2022-02-23 10:51:14,072 - INFO: No more reads found and terminated ... 2022-02-23 10:51:14,782 - INFO: Extending finished.
2022-02-23 10:51:14,795 - INFO: Separating extended fastq file ... 2022-02-23 10:51:15,137 - INFO: Setting '-k 21,55,85,115' 2022-02-23 10:51:15,137 - INFO: Assembling using SPAdes ... 2022-02-23 10:51:15,152 - INFO: /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades 2022-02-23 10:51:15,714 - WARNING: Assembling exited halfway.
2022-02-23 10:51:17,441 - ERROR: No valid assembly graph found!
I also checked that all the other python scripts were #!/usr/bin/env python3, as is required for my system.
I would try removing SPAdes under GetOrganelleDep
rm -r /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/
Then install the latest SPAdes using apt install
, or conda
, or from the source.
Let me know your updates.
BTW, no matter if the latest SPAdes could fix your issue on the Gadi environment or not, the GetOrganelleDep needs an update. I will leave this issue open until an update.
I deleted the dependency version of spades and use my (working) system version. Get the same error - see attached log. Thanks get_org.log.txt .
What is the result of spades.py --test
?
ah yes, spades problem..
spades.py --test
== Warning == No assembly mode was specified! If you intend to assemble high-coverage multi-cell/isolate data, use '--isolate' option.
Command line: /g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py --test
System information: SPAdes version: 3.15.2 Python version: 3.10.0 OS: Linux-4.18.0-348.2.1.el8.nci.x86_64-x86_64-with-glibc2.28
Output dir: /g/data/nm31/d/r3.21_aatol_extra_samples_2022/spades_test Mode: read error correction and assembling Debug mode is turned OFF
Dataset parameters:
Standard mode
For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'.
Reads:
Traceback (most recent call last):
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 651, in
Updated to spades 3.15.4, which works with python 3.10, and issue is now solved. Thanks.