GTDBTk
GTDBTk copied to clipboard
Uncontrolled exit resulting from an unexpected error.
Hi everyone,
Does anyone know solutions for the following error during running gtdbtk de_novo_wf? I think I am getting same error to case #433 though I tried to set the outgroup at genus- or class-levels. Would you please help me to see what is the problem in this case?
Thank you. sam ERROR: Uncontrolled exit resulting from an unexpected error. EXCEPTION: IndexError MESSAGE: list index out of range
[2023-11-01 05:18:58] INFO: GTDB-Tk v2.3.2 [2023-11-01 05:18:58] INFO: gtdbtk de_novo_wf --genome_dir Hyphomicrobiales_genome/ncbi_dataset_original/data/fna --out_dir de_novo_Hyp --outgroup_taxon g__Escherichia --extension fna --cpus 34 --skip_gtdb_refs --bacteria --custom_taxonomy custom_taxonomy [2023-11-01 05:18:58] INFO: Using GTDB-Tk reference data version r214: /home/xx/miniconda3/envs/gtdbtk-2.3.2/share/gtdbtk-2.3.2/db/release214 [2023-11-01 05:18:58] INFO: Identifying markers in 896 genomes with 34 threads. [2023-11-01 05:18:58] TASK: Running Prodigal V2.6.3 to identify genes. [2023-11-01 05:37:54] INFO: Completed 896 genomes in 18.94 minutes (47.31 genomes/minute). [2023-11-01 05:37:54] TASK: Identifying TIGRFAM protein families. [2023-11-01 05:42:37] INFO: Completed 896 genomes in 4.71 minutes (190.38 genomes/minute). [2023-11-01 05:42:37] TASK: Identifying Pfam protein families. [2023-11-01 05:42:52] INFO: Completed 896 genomes in 15.19 seconds (59.00 genomes/second). [2023-11-01 05:42:52] INFO: Annotations done using HMMER 3.3.2 (Nov 2020). [2023-11-01 05:42:52] TASK: Summarising identified marker genes. [2023-11-01 05:43:27] INFO: Completed 896 genomes in 35.21 seconds (25.45 genomes/second). [2023-11-01 05:43:27] INFO: Done. [2023-11-01 05:43:27] INFO: Aligning markers in 896 genomes with 34 CPUs. [2023-11-01 05:43:27] INFO: Processing 896 genomes identified as bacterial. [2023-11-01 05:43:27] TASK: Generating concatenated alignment for each marker. [2023-11-01 05:43:29] INFO: Completed 896 genomes in 1.38 seconds (647.77 genomes/second). [2023-11-01 05:43:29] TASK: Aligning 120 identified markers using hmmalign 3.3.2 (Nov 2020). [2023-11-01 05:44:30] INFO: Completed 120 markers in 1.02 minutes (118.00 markers/minute). [2023-11-01 05:44:30] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2023-11-01 05:44:32] INFO: Completed 896 sequences in 1.65 seconds (543.55 sequences/second). [2023-11-01 05:44:32] INFO: Masked bacterial alignment from 41,084 to 5,035 AAs. [2023-11-01 05:44:32] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2023-11-01 05:44:32] INFO: Creating concatenated alignment for 896 bacterial user genomes. [2023-11-01 05:44:32] INFO: Done. [2023-11-01 05:44:32] INFO: Inferring FastTree (WAG, SH support values) using a maximum of 34 CPUs. [2023-11-01 05:52:02] INFO: FastTree version: 2.1.11 [2023-11-01 05:52:02] INFO: Done. [2023-11-01 05:52:02] INFO: Reading GTDB taxonomy for representative genomes. [2023-11-01 05:52:02] INFO: Reading custom taxonomy file. [2023-11-01 05:52:02] INFO: Read custom taxonomy for 1 genomes. [2023-11-01 05:52:02] INFO: Reassigned taxonomy for 0 GTDB representative genomes. [2023-11-01 05:52:02] INFO: Read taxonomy for 85,206 genomes. [2023-11-01 05:52:02] INFO: Identifying genomes from the specified outgroup: g__Escherichia [2023-11-01 05:52:02] INFO: Identified 1 outgroup taxa in the tree. [2023-11-01 05:52:02] INFO: Identified 895 ingroup taxa in the tree. [2023-11-01 05:52:02] INFO: Outgroup is monophyletic. [2023-11-01 05:52:02] INFO: Rerooting tree. [2023-11-01 05:52:02] INFO: Rerooted tree written to: de_novo_Hyphomicro_fail_removed/infer/intermediate_results/gtdbtk.bac120.rooted.tree [2023-11-01 05:52:02] INFO: Done. [2023-11-01 05:52:02] INFO: Reading GTDB taxonomy for representative genomes. [2023-11-01 05:52:03] INFO: Reading custom taxonomy file. [2023-11-01 05:52:03] INFO: Read custom taxonomy for 1 genomes. [2023-11-01 05:52:03] INFO: Reassigned taxonomy for 0 GTDB representative genomes. [2023-11-01 05:52:03] INFO: Read taxonomy for 85,206 genomes. [2023-11-01 05:52:03] INFO: Reading tree. [2023-11-01 05:52:03] INFO: Removing any previous internal node labels. [2023-11-01 05:52:03] INFO: Calculating F-measure statistic for each taxa. [2023-11-01 05:52:03] INFO: Calculating taxa within each lineage. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Domain rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Phylum rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Class rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Order rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Family rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Genus rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Species rank. [2023-11-01 05:52:03] WARNING: There are 7 taxa with multiple placements of equal quality. [2023-11-01 05:52:03] WARNING: These were resolved by placing the label at the most terminal position. [2023-11-01 05:52:03] WARNING: Ideally, taxonomic assignment of all genomes should be established before tree decoration. [2023-11-01 05:52:03] INFO: Placing labels on tree. [2023-11-01 05:52:03] INFO: Writing out statistics for taxa. [2023-11-01 05:52:03] INFO: Writing out inferred taxonomy for each genome. [2023-11-01 05:52:03] ERROR: Uncontrolled exit resulting from an unexpected error.
================================================================================ EXCEPTION: IndexError MESSAGE: list index out of range
Traceback (most recent call last): File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/main.py", line 102, in main gt_parser.parse_options(args) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/main.py", line 1078, in parse_options self.decorate(options) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/main.py", line 815, in decorate reports = d.run(options.input_tree, File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/decorate.py", line 379, in run self._write_taxonomy(tree, out_taxonomy) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/decorate.py", line 314, in _write_taxonomy taxa = self._leaf_taxa(leaf) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/decorate.py", line 295, in _leaf_taxa last_rank = ordered_taxa[-1][0:3] IndexError: list index out of range
Environment
- [ ] Installed via pip (include the output of
pip list
) - [x] Using a conda environment (include the output of
conda list && conda list --revisions
) - [ ] Using a Docker container (include the
IMAGE ID
of the container)
Server information
- CPU: model name : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz 48
- RAM: 196652704 kB
- OS: ubuntu 20.04.6 LTS
Debugging information
- [x]
gtdbtk.log
has been included (drag and drop the file to upload). - [ ] Genomes have been included (if possible, and there are few).
Hello, Do you still have this issue? If so, I can see you are using a custom_taxonomy, is this taxonomy with 7 ranks for all genomes ( domain to species) or do you have some taxonomy string with less ranks ? Having less than 7 ranks can cause the decoration to fail.
Thank you so much for your follow-up! As you pointed it out, some genomes were missing lower ranks. Is there any easy way to extract the 7 ranks for custom genomes obtained from NCBI database? Since I have >800 genomes for running de novo GTDB-tk, I was wondering if you guys know any tools to efficiently create a custom_taxonomy. Thanks!
Sorry, I am not aware of any tool to convert the NCBI taxonomy to a standard taxonomy with 7 ranks.
TaxonKit can accomplish this it with subcomand taxonkit reformat
.
https://github.com/shenwei356/taxonkit