anvio icon indicating copy to clipboard operation
anvio copied to clipboard

[BUG] anvi-run-kegg-kofams terminated prematurely

Open lulux719 opened this issue 1 year ago • 1 comments

Short description of the problem

anvi-run-kegg-kofams terminated prematurely after producing the hmm.table file

anvi'o version

Anvi'o .......................................: marie (v8) Python .......................................: 3.10.13 Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2

System info

Installed via conda

Detailed description of the issue

I have tried several times, and anvi-run-kegg-kofams always terminated prematurely after producing the hmm.table file with the below message.

Done with KOfam 🎊

Number of raw hits in table file .............: 397,424,337 Terminated

Here's the hmm.table generated.

head hmm.table 11548766 - K24524 - 0.0034 19.5 0.0 0.0055 18.8 0.0 1.3 1 0 0 1 1 1 1 - 11690881 - K15921 - 4.7e-199 667.4 13.8 5.2e-199 667.3 13.8 1.0 1 0 0 1 1 1 1 - 11605040 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 - 11656118 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 -

Here's the info of contigs.db. anvi-db-info 03_CONTIGS/contigs.db

DB Info (no touch)

Database Path ................................: 03_CONTIGS/contigs.db description ..................................: [Not found, but it's OK] db_type ......................................: contigs (variant: unknown) version ......................................: 21

DB Info (no touch also)

project_name .................................: ob contigs_db_hash ..............................: hashc9e5c18c split_length .................................: 20000 kmer_size ....................................: 4 num_contigs ..................................: 1193879 total_length .................................: 16702353431 num_splits ...................................: 1450761 genes_are_called .............................: 1 external_gene_calls ..........................: 0 external_gene_amino_acid_seqs ................: 0 skip_predict_frame ...........................: 0 splits_consider_gene_calls ...................: 1 scg_taxonomy_was_run .........................: 0 scg_taxonomy_database_version ................: None trna_taxonomy_was_run ........................: 0 trna_taxonomy_database_version ...............: None creation_date ................................: 1706651817.91107 gene_function_sources ........................: Pfam gene_level_taxonomy_source ...................: kaiju

  • Please remember that it is never a good idea to change these values. But in some cases it may be absolutely necessary to update something here, and a programmer may ask you to run this program and do it. But even then, you should be extremely careful.

AVAILABLE GENE CALLERS

  • 'prodigal' (16,415,905 gene calls)
  • 'Ribosomal_RNA_28S' (11 gene calls)
  • 'Ribosomal_RNA_23S' (3,430 gene calls)
  • 'Ribosomal_RNA_18S' (13 gene calls)
  • 'Ribosomal_RNA_16S' (1,878 gene calls)

AVAILABLE FUNCTIONAL ANNOTATION SOURCES

  • Pfam (25,159,321 annotations)

AVAILABLE HMM SOURCES

  • 'Archaea_76' (76 models with 171,317 hits)
  • 'Bacteria_71' (71 models with 331,894 hits)
  • 'Protista_83' (83 models with 19,854 hits)
  • 'Ribosomal_RNA_12S' (1 model with 0 hits)
  • 'Ribosomal_RNA_16S' (3 models with 1,878 hits)
  • 'Ribosomal_RNA_18S' (1 model with 13 hits)
  • 'Ribosomal_RNA_23S' (2 models with 3,430 hits)
  • 'Ribosomal_RNA_28S' (1 model with 11 hits)
  • 'Ribosomal_RNA_5S' (5 models with 0 hits)

When I try it on a smaller contigs.db (1/4 of the samples), it completed without any problem. So I'm guessing there's something with server capacity. My question would be, are there any ways to bypass this issue? I assume the program finished the "Run an HMM search against KOfam" step. Is it possible to resume the program from here?

Thank you very much.

lulux719 avatar Apr 17 '24 12:04 lulux719

This looks like a memory issue, so there is not much we can do. BUT, there is always a way. In this case, I one could split their contigs-db file into 10 different ones using a collection-txt and anvi-split, then run anvi-run-kegg-kofams on each one of them separately, and then export the contents of gene_functions table from each one of them, and then manually import the final hits into the original contigs-db.

But this is a hacker's workaround, and a machine with a larger memory would have been the most optimal solution of course :)

meren avatar Apr 17 '24 12:04 meren