GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

Uncontrolled exit resulting from an unexpected error.

Open smta11 opened this issue 1 year ago • 4 comments

Hi everyone,

Does anyone know solutions for the following error during running gtdbtk de_novo_wf? I think I am getting same error to case #433 though I tried to set the outgroup at genus- or class-levels. Would you please help me to see what is the problem in this case?

Thank you. sam ERROR: Uncontrolled exit resulting from an unexpected error. EXCEPTION: IndexError MESSAGE: list index out of range

[2023-11-01 05:18:58] INFO: GTDB-Tk v2.3.2 [2023-11-01 05:18:58] INFO: gtdbtk de_novo_wf --genome_dir Hyphomicrobiales_genome/ncbi_dataset_original/data/fna --out_dir de_novo_Hyp --outgroup_taxon g__Escherichia --extension fna --cpus 34 --skip_gtdb_refs --bacteria --custom_taxonomy custom_taxonomy [2023-11-01 05:18:58] INFO: Using GTDB-Tk reference data version r214: /home/xx/miniconda3/envs/gtdbtk-2.3.2/share/gtdbtk-2.3.2/db/release214 [2023-11-01 05:18:58] INFO: Identifying markers in 896 genomes with 34 threads. [2023-11-01 05:18:58] TASK: Running Prodigal V2.6.3 to identify genes. [2023-11-01 05:37:54] INFO: Completed 896 genomes in 18.94 minutes (47.31 genomes/minute). [2023-11-01 05:37:54] TASK: Identifying TIGRFAM protein families. [2023-11-01 05:42:37] INFO: Completed 896 genomes in 4.71 minutes (190.38 genomes/minute). [2023-11-01 05:42:37] TASK: Identifying Pfam protein families. [2023-11-01 05:42:52] INFO: Completed 896 genomes in 15.19 seconds (59.00 genomes/second). [2023-11-01 05:42:52] INFO: Annotations done using HMMER 3.3.2 (Nov 2020). [2023-11-01 05:42:52] TASK: Summarising identified marker genes. [2023-11-01 05:43:27] INFO: Completed 896 genomes in 35.21 seconds (25.45 genomes/second). [2023-11-01 05:43:27] INFO: Done. [2023-11-01 05:43:27] INFO: Aligning markers in 896 genomes with 34 CPUs. [2023-11-01 05:43:27] INFO: Processing 896 genomes identified as bacterial. [2023-11-01 05:43:27] TASK: Generating concatenated alignment for each marker. [2023-11-01 05:43:29] INFO: Completed 896 genomes in 1.38 seconds (647.77 genomes/second). [2023-11-01 05:43:29] TASK: Aligning 120 identified markers using hmmalign 3.3.2 (Nov 2020). [2023-11-01 05:44:30] INFO: Completed 120 markers in 1.02 minutes (118.00 markers/minute). [2023-11-01 05:44:30] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2023-11-01 05:44:32] INFO: Completed 896 sequences in 1.65 seconds (543.55 sequences/second). [2023-11-01 05:44:32] INFO: Masked bacterial alignment from 41,084 to 5,035 AAs. [2023-11-01 05:44:32] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2023-11-01 05:44:32] INFO: Creating concatenated alignment for 896 bacterial user genomes. [2023-11-01 05:44:32] INFO: Done. [2023-11-01 05:44:32] INFO: Inferring FastTree (WAG, SH support values) using a maximum of 34 CPUs. [2023-11-01 05:52:02] INFO: FastTree version: 2.1.11 [2023-11-01 05:52:02] INFO: Done. [2023-11-01 05:52:02] INFO: Reading GTDB taxonomy for representative genomes. [2023-11-01 05:52:02] INFO: Reading custom taxonomy file. [2023-11-01 05:52:02] INFO: Read custom taxonomy for 1 genomes. [2023-11-01 05:52:02] INFO: Reassigned taxonomy for 0 GTDB representative genomes. [2023-11-01 05:52:02] INFO: Read taxonomy for 85,206 genomes. [2023-11-01 05:52:02] INFO: Identifying genomes from the specified outgroup: g__Escherichia [2023-11-01 05:52:02] INFO: Identified 1 outgroup taxa in the tree. [2023-11-01 05:52:02] INFO: Identified 895 ingroup taxa in the tree. [2023-11-01 05:52:02] INFO: Outgroup is monophyletic. [2023-11-01 05:52:02] INFO: Rerooting tree. [2023-11-01 05:52:02] INFO: Rerooted tree written to: de_novo_Hyphomicro_fail_removed/infer/intermediate_results/gtdbtk.bac120.rooted.tree [2023-11-01 05:52:02] INFO: Done. [2023-11-01 05:52:02] INFO: Reading GTDB taxonomy for representative genomes. [2023-11-01 05:52:03] INFO: Reading custom taxonomy file. [2023-11-01 05:52:03] INFO: Read custom taxonomy for 1 genomes. [2023-11-01 05:52:03] INFO: Reassigned taxonomy for 0 GTDB representative genomes. [2023-11-01 05:52:03] INFO: Read taxonomy for 85,206 genomes. [2023-11-01 05:52:03] INFO: Reading tree. [2023-11-01 05:52:03] INFO: Removing any previous internal node labels. [2023-11-01 05:52:03] INFO: Calculating F-measure statistic for each taxa. [2023-11-01 05:52:03] INFO: Calculating taxa within each lineage. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Domain rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Phylum rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Class rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Order rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Family rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Genus rank. [2023-11-01 05:52:03] INFO: Processing 1 taxa at Species rank. [2023-11-01 05:52:03] WARNING: There are 7 taxa with multiple placements of equal quality. [2023-11-01 05:52:03] WARNING: These were resolved by placing the label at the most terminal position. [2023-11-01 05:52:03] WARNING: Ideally, taxonomic assignment of all genomes should be established before tree decoration. [2023-11-01 05:52:03] INFO: Placing labels on tree. [2023-11-01 05:52:03] INFO: Writing out statistics for taxa. [2023-11-01 05:52:03] INFO: Writing out inferred taxonomy for each genome. [2023-11-01 05:52:03] ERROR: Uncontrolled exit resulting from an unexpected error.

================================================================================ EXCEPTION: IndexError MESSAGE: list index out of range


Traceback (most recent call last): File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/main.py", line 102, in main gt_parser.parse_options(args) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/main.py", line 1078, in parse_options self.decorate(options) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/main.py", line 815, in decorate reports = d.run(options.input_tree, File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/decorate.py", line 379, in run self._write_taxonomy(tree, out_taxonomy) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/decorate.py", line 314, in _write_taxonomy taxa = self._leaf_taxa(leaf) File "/home/xx/miniconda3/envs/gtdbtk-2.3.2/lib/python3.8/site-packages/gtdbtk/decorate.py", line 295, in _leaf_taxa last_rank = ordered_taxa[-1][0:3] IndexError: list index out of range

Environment

  • [ ] Installed via pip (include the output of pip list)
  • [x] Using a conda environment (include the output of conda list && conda list --revisions)
  • [ ] Using a Docker container (include the IMAGE ID of the container)

Server information

  • CPU: model name : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz 48
  • RAM: 196652704 kB
  • OS: ubuntu 20.04.6 LTS

Debugging information

  • [x] gtdbtk.log has been included (drag and drop the file to upload).
  • [ ] Genomes have been included (if possible, and there are few).

smta11 avatar Oct 31 '23 21:10 smta11

Hello, Do you still have this issue? If so, I can see you are using a custom_taxonomy, is this taxonomy with 7 ranks for all genomes ( domain to species) or do you have some taxonomy string with less ranks ? Having less than 7 ranks can cause the decoration to fail.

pchaumeil avatar Nov 20 '23 15:11 pchaumeil

Thank you so much for your follow-up! As you pointed it out, some genomes were missing lower ranks. Is there any easy way to extract the 7 ranks for custom genomes obtained from NCBI database? Since I have >800 genomes for running de novo GTDB-tk, I was wondering if you guys know any tools to efficiently create a custom_taxonomy. Thanks!

smta11 avatar Nov 20 '23 22:11 smta11

Sorry, I am not aware of any tool to convert the NCBI taxonomy to a standard taxonomy with 7 ranks.

pchaumeil avatar Nov 21 '23 16:11 pchaumeil

TaxonKit can accomplish this it with subcomand taxonkit reformat. https://github.com/shenwei356/taxonkit

hunglin59638 avatar Nov 23 '23 16:11 hunglin59638