GetOrganelle icon indicating copy to clipboard operation
GetOrganelle copied to clipboard

(ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index

Open tallnuttrbgv opened this issue 2 years ago • 11 comments

Hi,

I have installed using a manual method - git clone etc. But it seems to fail on test data building the bowtie index.

Thanks,

GetOrganelle v1.7.5.3

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-03.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2022-02-22 15:37:02,907 - INFO: Pre-reading fastq ... 2022-02-22 15:37:02,907 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-22 15:37:03,028 - INFO: Estimating reads to use finished. 2022-02-22 15:37:03,029 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-22 15:37:07,535 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-22 15:37:12,807 - INFO: Counting read qualities ... 2022-02-22 15:37:12,959 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-22 15:37:12,959 - INFO: Phred offset = 33 2022-02-22 15:37:12,960 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-22 15:37:13,012 - INFO: Mean error rate = 0.0019 2022-02-22 15:37:13,013 - INFO: Counting read lengths ... 2022-02-22 15:37:13,181 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-22 15:37:13,182 - INFO: Reads used = 91563+91563 2022-02-22 15:37:13,182 - INFO: Pre-reading fastq finished.

2022-02-22 15:37:13,182 - INFO: Making seed reads ... 2022-02-22 15:37:18,147 - INFO: Making seed - bowtie2 index ... 2022-02-22 15:37:18,212 - INFO: Making seed - bowtie2 index finished. 2022-02-22 15:37:18,213 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-22 15:37:18,316 - ERROR: (ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index Exiting now ...

2022-02-22 15:37:18,316 - ERROR: Traceback (most recent call last): File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3941, in main seed_fq, seed_sam, new_seed_f = making_seed_reads_using_mapping( File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3040, in making_seed_reads_using_mapping map_with_bowtie2(seed_file=seed_file, original_fq_files=original_fq_files, File "/g/data/nm31/bin/GetOrganelle/GetOrganelleLib/pipe_control_func.py", line 399, in map_with_bowtie2 raise Exception("") Exception

Total cost 22.27 s For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the assembly graph (can be *.png to protect your data privacy) if possible!

tallnuttrbgv avatar Feb 22 '22 04:02 tallnuttrbgv

Hi,

Sorry to interrupt and add a few problems in this thread. You might want to download the database first. Put it in your main directory https://github.com/Kinggerm/GetOrganelle/wiki/Initialization

I had the same trouble but succeed after downloading it.

However, I notice the same error I made with your log. If you see the dependencies, it's deprecated. I hope the authors could help us fix this problem.

See my log file. Something wrong with SPAdes.

get_org.log.txt

Thank you.

jaktykusuma avatar Feb 22 '22 15:02 jaktykusuma

jaktykusuma

Your error was different from the current thread. The deprecated dependency issue is currently a harmless warning, not an error.

The failure of running SPAdes in your case was caused by the space in your working directory, specifically, "IRD Works". Besides, please to 1.7.5+, which not only has better instant feedback info in the space-in-working-directory case but also has essential bugs fixed.

Kinggerm avatar Feb 22 '22 16:02 Kinggerm

Hi,

I have installed using a manual method - git clone etc. But it seems to fail on test data building the bowtie index.

Thanks,

GetOrganelle v1.7.5.3

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-03.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2022-02-22 15:37:02,907 - INFO: Pre-reading fastq ... 2022-02-22 15:37:02,907 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-22 15:37:03,028 - INFO: Estimating reads to use finished. 2022-02-22 15:37:03,029 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-22 15:37:07,535 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-22 15:37:12,807 - INFO: Counting read qualities ... 2022-02-22 15:37:12,959 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-22 15:37:12,959 - INFO: Phred offset = 33 2022-02-22 15:37:12,960 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-22 15:37:13,012 - INFO: Mean error rate = 0.0019 2022-02-22 15:37:13,013 - INFO: Counting read lengths ... 2022-02-22 15:37:13,181 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-22 15:37:13,182 - INFO: Reads used = 91563+91563 2022-02-22 15:37:13,182 - INFO: Pre-reading fastq finished.

2022-02-22 15:37:13,182 - INFO: Making seed reads ... 2022-02-22 15:37:18,147 - INFO: Making seed - bowtie2 index ... 2022-02-22 15:37:18,212 - INFO: Making seed - bowtie2 index finished. 2022-02-22 15:37:18,213 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-22 15:37:18,316 - ERROR: (ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index Exiting now ...

2022-02-22 15:37:18,316 - ERROR: Traceback (most recent call last): File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3941, in main seed_fq, seed_sam, new_seed_f = making_seed_reads_using_mapping( File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3040, in making_seed_reads_using_mapping map_with_bowtie2(seed_file=seed_file, original_fq_files=original_fq_files, File "/g/data/nm31/bin/GetOrganelle/GetOrganelleLib/pipe_control_func.py", line 399, in map_with_bowtie2 raise Exception("") Exception

Total cost 22.27 s For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the assembly graph (can be *.png to protect your data privacy) if possible!

Could you please

  1. run ls -lah Arabidopsis_simulated.plastome/seed/embplant_pt.index to list the files
  2. run bowtie2-build -h to see the reaction.
  3. rerun the command with "--verbose" added and attach the new log file here

Thanks!

Kinggerm avatar Feb 22 '22 16:02 Kinggerm

ls -lah Arabidopsis_simulated.plastome/seed/embplant_pt.index ls: cannot access 'Arabidopsis_simulated.plastome/seed/embplant_pt.index': No such file or directory

ls -lah Arabidopsis_simulated.plastome/seed/ total 15M drwxr-sr-x 2 ta0341 nm31 33K Feb 22 15:44 . drwxr-sr-x 3 ta0341 nm31 33K Feb 22 15:44 .. -rw-r--r-- 1 ta0341 nm31 15M Feb 22 15:44 embplant_pt.fasta

bowtie2-build -h === ERROR === The use of the #!/usr/bin/env python interpreter line in python scripts has been deprecated.

Please modify this script: /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/bowtie2/bowtie2-build

To use either #!/usr/bin/env python3 or #!/usr/bin/env python2 depending on which version of python you require Alternatively, if you are unable to modify this script You can load the python2-as-python or python3-as-python modules depending on which version of python you require

I fixed the interpreter line in bowtie2-build then got the error below. Verbose log also attached

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
GetOrganelle v1.7.5.3

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-06.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2022-02-23 10:50:16,977 - INFO: Pre-reading fastq ... 2022-02-23 10:50:16,977 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-23 10:50:17,177 - INFO: Estimating reads to use finished. 2022-02-23 10:50:17,177 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-23 10:50:17,513 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-23 10:50:20,697 - INFO: Counting read qualities ... 2022-02-23 10:50:20,851 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-23 10:50:20,851 - INFO: Phred offset = 33 2022-02-23 10:50:20,852 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-23 10:50:20,901 - INFO: Mean error rate = 0.0019 2022-02-23 10:50:20,902 - INFO: Counting read lengths ... 2022-02-23 10:50:21,068 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-23 10:50:21,068 - INFO: Reads used = 91563+91563 2022-02-23 10:50:21,068 - INFO: Pre-reading fastq finished.

2022-02-23 10:50:21,068 - INFO: Making seed reads ... 2022-02-23 10:50:24,278 - INFO: Making seed - bowtie2 index ... 2022-02-23 10:50:33,840 - INFO: Making seed - bowtie2 index finished. 2022-02-23 10:50:33,840 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-23 10:50:42,532 - INFO: Mapping finished. 2022-02-23 10:50:42,534 - INFO: Seed reads made: Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq (14144302 bytes) 2022-02-23 10:50:42,535 - INFO: Making seed reads finished.

2022-02-23 10:50:42,535 - INFO: Checking seed reads and parameters ... 2022-02-23 10:50:42,535 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s). 2022-02-23 10:50:42,535 - INFO: If the result graph is not a circular organelle genome, 2022-02-23 10:50:42,535 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run. 2022-02-23 10:50:45,524 - INFO: Pre-assembling mapped reads ... 2022-02-23 10:50:47,545 - INFO: Retrying with more reads .. 2022-02-23 10:51:06,399 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading. 2022-02-23 10:51:07,664 - INFO: Estimated embplant_pt-hitting base-coverage = 52.85 2022-02-23 10:51:07,876 - INFO: Estimated word size(s): 98 2022-02-23 10:51:07,877 - INFO: Setting '-w 98' 2022-02-23 10:51:07,877 - INFO: Setting '--max-extending-len inf' 2022-02-23 10:51:07,958 - INFO: Checking seed reads and parameters finished.

2022-02-23 10:51:07,958 - INFO: Making read index ... 2022-02-23 10:51:09,003 - INFO: Mem 0.324 G, 178623 candidates in all 183126 reads 2022-02-23 10:51:09,003 - INFO: Pre-grouping reads ... 2022-02-23 10:51:09,004 - INFO: Setting '--pre-w 98' 2022-02-23 10:51:09,030 - INFO: Mem 0.324 G, 4074/4074 used/duplicated 2022-02-23 10:51:09,287 - INFO: Mem 0.324 G, 517 groups made. 2022-02-23 10:51:09,298 - INFO: Making read index finished.

2022-02-23 10:51:09,298 - INFO: Extending ... 2022-02-23 10:51:09,298 - INFO: Adding initial words ... 2022-02-23 10:51:10,821 - INFO: AW 1113742 2022-02-23 10:51:12,411 - INFO: Round 1: 178623/178623 AI 40378 AW 1126044 Mem 0.437 2022-02-23 10:51:13,216 - INFO: Round 2: 178623/178623 AI 40411 AW 1126346 Mem 0.437 2022-02-23 10:51:14,071 - INFO: Round 3: 178623/178623 AI 40411 AW 1126346 Mem 0.437 2022-02-23 10:51:14,072 - INFO: No more reads found and terminated ... 2022-02-23 10:51:14,782 - INFO: Extending finished.

2022-02-23 10:51:14,795 - INFO: Separating extended fastq file ... 2022-02-23 10:51:15,137 - INFO: Setting '-k 21,55,85,115' 2022-02-23 10:51:15,137 - INFO: Assembling using SPAdes ... 2022-02-23 10:51:15,152 - INFO: /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades 2022-02-23 10:51:15,714 - WARNING: Assembling exited halfway.

2022-02-23 10:51:17,441 - ERROR: No valid assembly graph found!

get_org.log.txt

tallnuttrbgv avatar Feb 23 '22 00:02 tallnuttrbgv

I also checked that all the other python scripts were #!/usr/bin/env python3, as is required for my system.

tallnuttrbgv avatar Feb 23 '22 00:02 tallnuttrbgv

I would try removing SPAdes under GetOrganelleDep

rm -r /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/

Then install the latest SPAdes using apt install, or conda, or from the source.

Let me know your updates.

Kinggerm avatar Feb 23 '22 03:02 Kinggerm

BTW, no matter if the latest SPAdes could fix your issue on the Gadi environment or not, the GetOrganelleDep needs an update. I will leave this issue open until an update.

Kinggerm avatar Feb 23 '22 03:02 Kinggerm

I deleted the dependency version of spades and use my (working) system version. Get the same error - see attached log. Thanks get_org.log.txt .

tallnuttrbgv avatar Feb 23 '22 03:02 tallnuttrbgv

What is the result of spades.py --test?

Kinggerm avatar Feb 23 '22 04:02 Kinggerm

ah yes, spades problem..

spades.py --test

== Warning == No assembly mode was specified! If you intend to assemble high-coverage multi-cell/isolate data, use '--isolate' option.

Command line: /g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py --test

System information: SPAdes version: 3.15.2 Python version: 3.10.0 OS: Linux-4.18.0-348.2.1.el8.nci.x86_64-x86_64-with-glibc2.28

Output dir: /g/data/nm31/d/r3.21_aatol_extra_samples_2022/spades_test Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Standard mode For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'. Reads: Traceback (most recent call last): File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 651, in main(sys.argv) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 591, in main print_params(log, log_filename, command_line, args, cfg) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 327, in print_params print_used_values(cfg, log) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 117, in print_used_values dataset_data = pyyaml.load(open(cfg["dataset"].yaml_filename)) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/init.py", line 72, in load return loader.get_single_data() File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 37, in get_single_data return self.construct_document(node) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 46, in construct_document for dummy in generator: File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 398, in construct_yaml_map value = self.construct_mapping(node) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 204, in construct_mapping return super().construct_mapping(node, deep=deep) File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 126, in construct_mapping if not isinstance(key, collections.Hashable): AttributeError: module 'collections' has no attribute 'Hashable'

tallnuttrbgv avatar Feb 23 '22 04:02 tallnuttrbgv

Updated to spades 3.15.4, which works with python 3.10, and issue is now solved. Thanks.

tallnuttrbgv avatar Feb 23 '22 04:02 tallnuttrbgv