DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

No descriptions were found for your IDs error

Open MaxRubinBlum opened this issue 2 years ago • 25 comments

Hi, I am running a recent version of DRAM, and get the following error (after some MAGs were annotated without a problem). I tried updating the description db, but the error persists. Please help!

8:40:42.241646: Annotating MAG058 8:41:07.796558: Turning genes from prodigal to mmseqs2 db 8:41:09.903825: Getting hits from kofam 8:59:29.902988: Getting forward best hits from peptidase 8:59:33.438805: Getting reverse best hits from peptidase 8:59:34.544599: Getting descriptions of hits from peptidase 8:59:34.879067: Getting hits from pfam 8:59:45.552869: Getting hits from dbCAN Traceback (most recent call last): File "/home/bioinf/miniconda3/envs/DRAM/bin/DRAM.py", line 189, in args.func(**args_dict) File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1039, in annotate_bins_cmd annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold, File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1078, in annotate_bins all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 835, in annotate_orfs annotation_list.append(run_hmmscan(genes_faa=gene_faa, File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 372, in run_hmmscan return formater(hits) File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 279, in dbcan_hmmscan_formater hits_df[f"{db_name}_hits"] = hits_df[f"{db_name}_id"].apply( File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/series.py", line 4433, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/apply.py", line 1082, in apply return self.apply_standard() File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/apply.py", line 1137, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", line 2870, in pandas._libs.lib.map_infer File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 281, in db_handler.get_descriptions( File "/home/bioinf/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py", line 81, in get_descriptions warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0], IndexError: list index out of range

MaxRubinBlum avatar Mar 17 '22 06:03 MaxRubinBlum

Hmm that is interesting, it seems like you may have a corruption in your database, but I am not sure. Can you send me MAG058 and I will see if I can reproduce the problem. Make sure you can share the data first, I don't want you to get in trouble.

rmFlynn avatar Mar 17 '22 15:03 rmFlynn

MAG058.zip

Many thanks for the quick response! Please see MAG58 attached. This is data from my own project - no problem with sharing.

MaxRubinBlum avatar Mar 17 '22 16:03 MaxRubinBlum

Ok, I still get the same warning, but it does not cause dram to fail, which it should not. dbCan just has some IDs for which there are no descriptions. I tried with an older version of dram, and a new one. I think that something may be up with your databases, so I need to suggest that you update your database by running the database portion of the setup instructions again with a new db location. I know that is a pain. However, first run DRAM-setup.py version and DRAM-setup.py print_config and post the output here, just in case there is something more obviously wrong.

rmFlynn avatar Mar 17 '22 23:03 rmFlynn

I am also getting this error on a fresh build of DRAM:

(vs2) artic@lab-on-an-ssd:~/bens_toys$ DRAM-v.py annotate -i vs2-pass2/for-dramv/final-viral-combined-for-dramv.fa -v vs2-pass2/for-dramv/viral-affi-contigs-for-dramv.tab -o dramv-annotate --skip
_trnascan --threads 28 --min_contig_size 10000
2022-03-24 20:43:39.258458: Viral annotation started
0:00:00.007782: Retrieved database locations and descriptions
0:00:00.007804: Annotating final-viral-combined-for-dramv
0:00:05.832831: Turning genes from prodigal to mmseqs2 db
0:00:06.769442: Getting hits from kofam
0:07:45.954784: Getting forward best hits from viral
0:07:47.530468: Getting reverse best hits from viral
0:07:47.951419: Getting descriptions of hits from viral
/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this YP_004010384.1 look like an id f
rom viral_description
  warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0],
Traceback (most recent call last):
  File "/home/artic/miniconda3/envs/vs2/bin/DRAM-v.py", line 153, in <module>
    args.func(**args_dict)
  File "/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/annotate_vgfs.py", line 473, in annotate_vgfs
    annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas
    annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta
    annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
  File "/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 813, in annotate_orfs
    annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['viral'], tmp_dir,
  File "/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 683, in do_blast_style_search
    hits = formater(hits, header_dict)
  File "/home/artic/miniconda3/envs/vs2/lib/python3.8/site-packages/mag_annotator/annotate_bins.py", line 173, in get_basic_description
    header = header_dict[hit]
KeyError: 'YP_004010384.1'

DRAM-setup.py version output:

1.3.4

DRAM-setup.py print_config output:

Processed search databases
KEGG db: None
KOfam db: /home/artic/bens_toys/dbs/kofam_profiles.hmm
KOfam KO list: /home/artic/bens_toys/dbs/kofam_ko_list.tsv
UniRef db: None
Pfam db: /home/artic/bens_toys/dbs/pfam.mmspro
dbCAN db: /home/artic/bens_toys/dbs/dbCAN-HMMdb-V10.txt
RefSeq Viral db: /home/artic/bens_toys/dbs/refseq_viral.20220324.mmsdb
MEROPS peptidase db: /home/artic/bens_toys/dbs/peptidases.20220324.mmsdb
VOGDB db: /home/artic/bens_toys/dbs/vog_latest_hmms.txt

Descriptions of search database entries
Pfam hmm dat: /home/artic/bens_toys/dbs/Pfam-A.hmm.dat.gz
dbCAN family activities: /home/artic/bens_toys/dbs/CAZyDB.07292021.fam-activities.txt
VOG annotations: /home/artic/bens_toys/dbs/vog_annotations_latest.tsv.gz

Description db: /home/artic/bens_toys/dbs/description_db.sqlite

DRAM distillation sheets
Genome summary form: /home/artic/bens_toys/dbs/genome_summary_form.20220324.tsv
Module step form: /home/artic/bens_toys/dbs/module_step_form.20220324.tsv
ETC module database: /home/artic/bens_toys/dbs/etc_mdoule_database.20220324.tsv
Function heatmap form: /home/artic/bens_toys/dbs/function_heatmap_form.20220324.tsv
AMG database: /home/artic/bens_toys/dbs/amg_database.20220324.tsv

Input files are attached viral-affi-contigs-for-dramv.zip

btemperton avatar Mar 24 '22 21:03 btemperton

Hey there :)

Thanks for developing and maintaining DRAM :)

Edit: Seems @btemperton and I were writing at the same time, and I didn't see his until after i posted. Looks like the same issue. Didn't mean to spam the thread :)

I'm having a similar issue in terms of the error and it causing DRAM to fail when running DRAM.py annotate, but instead it's happening with the peptidase db for me.

My install steps and outputs from that are on this page if wanting to see that.

Info

DRAM-setup.py version
1.3.4
DRAM-setup.py print_config
Processed search databases
KEGG db: None
KOfam db: /media/executor/mlee/dram/DRAM_data/kofam_profiles.hmm
KOfam KO list: /media/executor/mlee/dram/DRAM_data/kofam_ko_list.tsv
UniRef db: None
Pfam db: /media/executor/mlee/dram/DRAM_data/pfam.mmspro
dbCAN db: /media/executor/mlee/dram/DRAM_data/dbCAN-HMMdb-V10.txt
RefSeq Viral db: /media/executor/mlee/dram/DRAM_data/refseq_viral.20220323.mmsdb
MEROPS peptidase db: /media/executor/mlee/dram/DRAM_data/peptidases.20220323.mmsdb
VOGDB db: /media/executor/mlee/dram/DRAM_data/vog_latest_hmms.txt

Descriptions of search database entries
Pfam hmm dat: /media/executor/mlee/dram/DRAM_data/Pfam-A.hmm.dat.gz
dbCAN family activities: /media/executor/mlee/dram/DRAM_data/CAZyDB.07292021.fam-activities.txt
VOG annotations: /media/executor/mlee/dram/DRAM_data/vog_annotations_latest.tsv.gz

Description db: /media/executor/mlee/dram/DRAM_data/description_db.sqlite

DRAM distillation sheets
Genome summary form: /media/executor/mlee/dram/DRAM_data/genome_summary_form.20220323.tsv
Module step form: /media/executor/mlee/dram/DRAM_data/module_step_form.20220323.tsv
ETC module database: /media/executor/mlee/dram/DRAM_data/etc_mdoule_database.20220323.tsv
Function heatmap form: /media/executor/mlee/dram/DRAM_data/function_heatmap_form.20220323.tsv
AMG database: /media/executor/mlee/dram/DRAM_data/amg_database.20220323.tsv

Getting test genome

curl -L -o GCF_000005845.2.fasta.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz

gunzip GCF_000005845.2.fasta.gz

Error/Failure on DRAM.py annotate

DRAM.py annotate -i GCF_000005845.2.fasta -o test-DRAM-output
1 fastas found
2022-03-23 22:33:00.825213: Annotation started
0:00:00.009530: Retrieved database locations and descriptions
0:00:00.009586: Annotating GCF_000005845.2
0:01:11.657135: Turning genes from prodigal to mmseqs2 db
0:01:14.437307: Getting hits from kofam

0:23:17.054402: Getting forward best hits from peptidase
0:23:30.591247: Getting reverse best hits from peptidase
0:23:31.874700: Getting descriptions of hits from peptidase
/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this MER0295850 look like an id from peptidase_description
  warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0],
Traceback (most recent call last):
  File "/media/executor/mlee/miniconda3/envs/DRAM/bin/DRAM.py", line 189, in <module>
    args.func(**args_dict)
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1039, in annotate_bins_cmd
    annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold,
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1078, in annotate_bins
    all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas
    annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta
    annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 820, in annotate_orfs
    annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['peptidase'], tmp_dir,
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 683, in do_blast_style_search
    hits = formater(hits, header_dict)
  File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 187, in get_peptidase_description
    header = header_dict[peptidase_hit]
KeyError: 'MER0295850'

Thanks for any help!

AstrobioMike avatar Mar 24 '22 21:03 AstrobioMike

The issue persists for me after reinstalling DRAM and rebuilding databases. DRAM worked great in older builds until I decided to fix conda and reinstall all the environments - poor choices :) Many thanks for the help!

MaxRubinBlum avatar Mar 26 '22 06:03 MaxRubinBlum

@rmFlynn

Hi, unfortunately, I am having the same problem as everyone else. Any suggestions?

DRAM-v.py annotate -i /home/alis/NASA_DRAMV_Files/VAL_A09_dramv.fa -v /home/alis/NASA_DRAMV_Files/VAL_A09_viral-affi_dramv.tab -o VAL_A09_dramv-annotation-result --skip_trnascan --threads 28 --min_contig_size 1000 2022-03-29 16:17:08.610389: Viral annotation started 0:00:00.021332: Retrieved database locations and descriptions 0:00:00.021366: Annotating VAL_A09_dramv 0:00:28.726808: Turning genes from prodigal to mmseqs2 db 0:00:30.535642: Getting hits from kofam 0:11:03.798142: Getting forward best hits from viral 0:11:10.948082: Getting reverse best hits from viral 0:11:11.841648: Getting descriptions of hits from viral /home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this YP_009622437.1 look like an id from viral_description warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0], Traceback (most recent call last): File "/home/alis/anaconda3/envs/DRAM/bin/DRAM-v.py", line 153, in args.func(**args_dict) File "/home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_vgfs.py", line 473, in annotate_vgfs annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 813, in annotate_orfs annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['viral'], tmp_dir, File "/home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 683, in do_blast_style_search hits = formater(hits, header_dict) File "/home/alis/anaconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 173, in get_basic_description header = header_dict[hit] KeyError: 'YP_009622437.1'

DRAM version: DRAM-setup.py version 1.3.4

DRAM-setup.py print_config

Processed search databases KEGG db: None KOfam db: /data/alis/DRAM_database/kofam_profiles.hmm KOfam KO list: /data/alis/DRAM_database/kofam_ko_list.tsv UniRef db: /data/alis/DRAM_database/uniref90.20220328.mmsdb Pfam db: /data/alis/DRAM_database/pfam.mmspro dbCAN db: /data/alis/DRAM_database/dbCAN-HMMdb-V10.txt RefSeq Viral db: /data/alis/DRAM_database/refseq_viral.20220328.mmsdb MEROPS peptidase db: /data/alis/DRAM_database/peptidases.20220328.mmsdb VOGDB db: /data/alis/DRAM_database/vog_latest_hmms.txt

Descriptions of search database entries Pfam hmm dat: /data/alis/DRAM_database/Pfam-A.hmm.dat.gz dbCAN family activities: /data/alis/DRAM_database/CAZyDB.07292021.fam-activities.txt VOG annotations: /data/alis/DRAM_database/vog.annotations.tsv.gz

Description db: /data/alis/DRAM_database/description_db.sqlite

DRAM distillation sheets Genome summary form: /data/alis/DRAM_database/genome_summary_form.tsv Module step form: /data/alis/DRAM_database/module_step_form.tsv ETC module database: /data/alis/DRAM_database/etc_module_database.tsv Function heatmap form: /data/alis/DRAM_database/function_heatmap_form.tsv AMG database: /data/alis/DRAM_database/amg_database.tsv

alisDRI avatar Mar 30 '22 00:03 alisDRI

Ok I have a new theory that the problem lies in part on using different versions of python. If anyone has not yet run DRAM-setup.py update_description_db to see if that fixes the problem, please do so. Otherwise, I will try to nail down which versions of dram don't work on what versions of python, and also I will find out what is happening.

rmFlynn avatar Mar 30 '22 18:03 rmFlynn

DRAM-setup.py update_description_db seems to have fixed the issue for me. Thanks!

btemperton avatar Apr 02 '22 13:04 btemperton

Thanks! A fix for the rest of you should be on GitHub later today.

rmFlynn avatar Apr 04 '22 17:04 rmFlynn

Hi @rmFlynn,

Thank you for fixing the problem with the DRAM annotation step. Should I update my database again? or should I update the DRAM version to fix the annotation problem? Could you please provide me with the new GitHub? Thank you

alisDRI avatar Apr 11 '22 17:04 alisDRI

The main branch has the fix, and it is in bioconda. You can fallow the instructions here to upgrade. The only thing I may add is that you may want to make a new environment, as that will guarantee a good python version. In any case you should be able to use the same databases, just export and import like in the instructions. You may also want to run DRAM-setup.py update_description_db once you do for good measure.

rmFlynn avatar Apr 12 '22 23:04 rmFlynn

@rmFlynn I am so sorry to bother you again. I am still having the same issue with the latest version of the DRAM mentioned on Mar 29.

DRAM-v.py annotate -i /home/alis/NASA_DRAMV_Files/VAL_A09_dramv.fa -v /home/alis/NASA_DRAMV_Files/VAL_A09_viral-affi_dramv.tab -o VAL_A09_dramv-annotation-result --skip_trnascan --threads 10 --min_contig_size 1000 2022-04-28 23:44:52.879071: Viral annotation started 0:00:00.024046: Retrieved database locations and descriptions 0:00:00.024092: Annotating VAL_A09_dramv 0:00:33.209904: Turning genes from prodigal to mmseqs2 db 0:00:35.186101: Getting hits from kofam 0:10:06.446820: Getting forward best hits from viral 0:10:16.228298: Getting reverse best hits from viral 0:10:17.188693: Getting descriptions of hits from viral /home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this YP_007675021.1 look like an id from viral_description warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0], Traceback (most recent call last): File "/home/alis/anaconda3/bin/DRAM-v.py", line 153, in args.func(**args_dict) File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_vgfs.py", line 475, in annotate_vgfs annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1013, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 921, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 814, in annotate_orfs annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['viral'], tmp_dir, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 684, in do_blast_style_search hits = formater(hits, header_dict) File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 173, in get_basic_description header = header_dict[hit] KeyError: 'YP_007675021.1'

DRAM-setup.py version 1.3.5

python --version Python 3.9.7

conda --version conda 4.12.0

DRAM-setup.py print_config Processed search databases KEGG db: None KOfam db: /data/alis/DRAM_database/kofam_profiles.hmm KOfam KO list: /data/alis/DRAM_database/kofam_ko_list.tsv UniRef db: /data/alis/DRAM_database/uniref90.20220428.mmsdb Pfam db: /data/alis/DRAM_database/pfam.mmspro dbCAN db: /data/alis/DRAM_database/dbCAN-HMMdb-V10.txt RefSeq Viral db: /data/alis/DRAM_database/refseq_viral.20220428.mmsdb MEROPS peptidase db: /data/alis/DRAM_database/peptidases.20220428.mmsdb VOGDB db: /data/alis/DRAM_database/vog_latest_hmms.txt

Descriptions of search database entries Pfam hmm dat: /data/alis/DRAM_database/Pfam-A.hmm.dat.gz dbCAN family activities: /data/alis/DRAM_database/CAZyDB.07292021.fam-activities.txt VOG annotations: /data/alis/DRAM_database/vog.annotations.tsv.gz

Description db: /data/alis/DRAM_database/description_db.sqlite

DRAM distillation sheets Genome summary form: /data/alis/DRAM_database/genome_summary_form.tsv Module step form: /data/alis/DRAM_database/module_step_form.tsv ETC module database: /data/alis/DRAM_database/etc_module_database.tsv Function heatmap form: /data/alis/DRAM_database/function_heatmap_form.tsv AMG database: /data/alis/DRAM_database/amg_database.tsv

The annotation script:

DRAM-v.py annotate -i /home/alis/NASA_DRAMV_Files/VAL_A09_dramv.fa -v /home/alis/NASA_DRAMV_Files/VAL_A09_viral-affi_dramv.tab -o VAL_A09_dramv-annotation-result --skip_trnascan --threads 10 --min_contig_size 1000

I should mention that I am using the old DRAM data files, and I had to manually copy and paste the kofam_ko_list.tsv and vog.annotations.tsv to DRAM_database folder. I was getting an error before that.

alisDRI avatar Apr 29 '22 20:04 alisDRI

And you ran DRAM-setup.py update_description_db? If you did, I will need to look into this deeply because the descriptions and keys should have come from the hmm itself and this should not have happened. I mean, you can't have a key in the DB that is not in the db.

rmFlynn avatar Apr 29 '22 22:04 rmFlynn

Hi @rmFlynn, Thank you so much for your prompt reply and I apologize for my late reply. I tried to run DRAM-setup.py update_description_db on a computer with 120 GB RAM but after 30 min I got the message "killed". Is it a memory issue? I could run the annotation once, but I was unable to find the annotation.tsv file. DRAM did not generate it for some reason.

alisDRI avatar May 23 '22 19:05 alisDRI

It could be a memory issue, for sure. Sadly, the size of the descriptions DB has ballooned with subsequent releases of the databases we use, and now it is a really unreasonable size. Try only using one thread that is my first bit of advice, I will circle back to this, but for now good luck.

rmFlynn avatar May 24 '22 18:05 rmFlynn

@rmFlynn I am so sorry to bother you again. I am still having the same issue with the latest version of the DRAM mentioned on Mar 29.

DRAM-v.py annotate -i /home/alis/NASA_DRAMV_Files/VAL_A09_dramv.fa -v /home/alis/NASA_DRAMV_Files/VAL_A09_viral-affi_dramv.tab -o VAL_A09_dramv-annotation-result --skip_trnascan --threads 10 --min_contig_size 1000 2022-04-28 23:44:52.879071: Viral annotation started 0:00:00.024046: Retrieved database locations and descriptions 0:00:00.024092: Annotating VAL_A09_dramv 0:00:33.209904: Turning genes from prodigal to mmseqs2 db 0:00:35.186101: Getting hits from kofam 0:10:06.446820: Getting forward best hits from viral 0:10:16.228298: Getting reverse best hits from viral 0:10:17.188693: Getting descriptions of hits from viral /home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this YP_007675021.1 look like an id from viral_description warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0], Traceback (most recent call last): File "/home/alis/anaconda3/bin/DRAM-v.py", line 153, in args.func(**args_dict) File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_vgfs.py", line 475, in annotate_vgfs annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1013, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 921, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 814, in annotate_orfs annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['viral'], tmp_dir, File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 684, in do_blast_style_search hits = formater(hits, header_dict) File "/home/alis/anaconda3/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 173, in get_basic_description header = header_dict[hit] KeyError: 'YP_007675021.1'

DRAM-setup.py version 1.3.5

python --version Python 3.9.7

conda --version conda 4.12.0

DRAM-setup.py print_config Processed search databases KEGG db: None KOfam db: /data/alis/DRAM_database/kofam_profiles.hmm KOfam KO list: /data/alis/DRAM_database/kofam_ko_list.tsv UniRef db: /data/alis/DRAM_database/uniref90.20220428.mmsdb Pfam db: /data/alis/DRAM_database/pfam.mmspro dbCAN db: /data/alis/DRAM_database/dbCAN-HMMdb-V10.txt RefSeq Viral db: /data/alis/DRAM_database/refseq_viral.20220428.mmsdb MEROPS peptidase db: /data/alis/DRAM_database/peptidases.20220428.mmsdb VOGDB db: /data/alis/DRAM_database/vog_latest_hmms.txt

Descriptions of search database entries Pfam hmm dat: /data/alis/DRAM_database/Pfam-A.hmm.dat.gz dbCAN family activities: /data/alis/DRAM_database/CAZyDB.07292021.fam-activities.txt VOG annotations: /data/alis/DRAM_database/vog.annotations.tsv.gz

Description db: /data/alis/DRAM_database/description_db.sqlite

DRAM distillation sheets Genome summary form: /data/alis/DRAM_database/genome_summary_form.tsv Module step form: /data/alis/DRAM_database/module_step_form.tsv ETC module database: /data/alis/DRAM_database/etc_module_database.tsv Function heatmap form: /data/alis/DRAM_database/function_heatmap_form.tsv AMG database: /data/alis/DRAM_database/amg_database.tsv

The annotation script:

DRAM-v.py annotate -i /home/alis/NASA_DRAMV_Files/VAL_A09_dramv.fa -v /home/alis/NASA_DRAMV_Files/VAL_A09_viral-affi_dramv.tab -o VAL_A09_dramv-annotation-result --skip_trnascan --threads 10 --min_contig_size 1000

I should mention that I am using the old DRAM data files, and I had to manually copy and paste the kofam_ko_list.tsv and vog.annotations.tsv to DRAM_database folder. I was getting an error before that.

Hi,

I am currently having the same issue. And I am running into memory issues when I run DRAM-setup.py update_description_db. I just want to follow up to see if there were any other recommendations.

Thanks for your help.

gogogogogul avatar Aug 26 '22 03:08 gogogogogul

I am also running into the same issues and looking for recommendations...

DRAM.py annotate -i "/home/kvilleneuve/Shotgun_Project/Melanie_Shotgun/bins/*.fa" -o annotation --threads 40
6 fastas found
2022-09-01 14:36:37.237398: Annotation started
0:00:00.005592: Retrieved database locations and descriptions
0:00:00.005676: Annotating bins.9
0:00:38.679510: Turning genes from prodigal to mmseqs2 db
0:00:40.161186: Getting hits from kofam
0:20:44.542424: Getting forward best hits from peptidase
0:20:54.754270: Getting reverse best hits from peptidase
0:20:55.871128: Getting descriptions of hits from peptidase
/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this MER0501642 look like an id from peptidase_description
  warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0],
Traceback (most recent call last):
  File "/home/kvilleneuve/anaconda/envs/DRAM/bin/DRAM.py", line 189, in <module>
    args.func(**args_dict)
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1039, in annotate_bins_cmd
    annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold,
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1078, in annotate_bins
    all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas
    annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta
    annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 820, in annotate_orfs
    annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['peptidase'], tmp_dir,
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 683, in do_blast_style_search
    hits = formater(hits, header_dict)
  File "/home/kvilleneuve/anaconda/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 187, in get_peptidase_description
    header = header_dict[peptidase_hit]
KeyError: 'MER0501642'

karinevilleneuve avatar Sep 01 '22 15:09 karinevilleneuve

And you ran DRAM-setup.py update_description_db?

rmFlynn avatar Sep 01 '22 18:09 rmFlynn

Same as what other people mentionned, the process kills after about 30 minutes and it doesn't accept arguments to change number of threats.

Karine

On Thu., Sep. 1, 2022, 14:25 Rory M Flynn, @.***> wrote:

And you ran DRAM-setup.py update_description_db?

— Reply to this email directly, view it on GitHub https://github.com/WrightonLabCSU/DRAM/issues/158#issuecomment-1234633964, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPIZQJDJQRHEYK6LSZ5PQ3V4DYKXANCNFSM5Q56QLWA . You are receiving this because you commented.Message ID: @.***>

karinevilleneuve avatar Sep 01 '22 23:09 karinevilleneuve

How much RAM is available in your system?

rmFlynn avatar Sep 02 '22 00:09 rmFlynn

According to the command cat /proc/meminfo I have 527986696 kB MemTotal.

On Sep 1, 2022, at 8:14 PM, Rory M Flynn @.***> wrote:

How much RAM is available in your system?

— Reply to this email directly, view it on GitHub https://github.com/WrightonLabCSU/DRAM/issues/158#issuecomment-1234931795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPIZQONRSKQDQFIJUEWUBDV4FBE3ANCNFSM5Q56QLWA. You are receiving this because you commented.

karinevilleneuve avatar Sep 07 '22 12:09 karinevilleneuve

That could be just enough if nothing else was running on your system, unfortunately I do not know exactly the amount that is needed because the sizes of the databases are increasing. do you think you could try DRAM-setup.py prepare_databases --output_dir your_new_uniref_free_output --skip_uniref to reduce the memory size?

rmFlynn avatar Sep 07 '22 19:09 rmFlynn

So I should delete all the databases already downloaded and start the process over without uniref ?

On Sep 7, 2022, at 3:29 PM, Rory M Flynn @.***> wrote:

That could be just enough if nothing else was running on your system, unfortunately I do not know exactly the amount that is needed because the sizes of the databases are increasing. do you think you could try DRAM-setup.py prepare_databases --output_dir your_new_uniref_free_output --skip_uniref to reduce the memory size?

— Reply to this email directly, view it on GitHub https://github.com/WrightonLabCSU/DRAM/issues/158#issuecomment-1239788975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPIZQKIGXHR4MVALW5BTN3V5DUIBANCNFSM5Q56QLWA. You are receiving this because you commented.

karinevilleneuve avatar Sep 08 '22 13:09 karinevilleneuve

So the problem is, I have not tested removing the Uniref database afterwards, it just has not come up as a huge problem. So I can't guarantee that it will work. If you want, you can use DRAM-setup.py export_config --output_file DRAM_config then edit that DRAM_config file setting all uniref paths to null, and then DRAM-setup.py import_config --config_loc DRAM_config then try it to update the descriptions' database. Your best chance of success is to redo from start, sorry to say, but you can still export your old config file to test things out. Let me know how it works, whatever you do.

rmFlynn avatar Sep 08 '22 16:09 rmFlynn

Hi Rory !

I am able to run the program now. Thank you for your time!

Karine

On Sep 8, 2022, at 12:28 PM, Rory M Flynn @.***> wrote:

So the problem is, I have not tested removing the Uniref database afterwards, it just has not come up as a huge problem. So I can't guarantee that it will work. If you want, you can use DRAM-setup.py export_config --output_file DRAM_config then edit that DRAM_config file setting all uniref paths to null, and then DRAM-setup.py import_config --config_loc DRAM_config then try it to update the descriptions' database. Your best chance of success is to redo from start, sorry to say, but you can still export your old config file to test things out. Let me know how it works, whatever you do.

— Reply to this email directly, view it on GitHub https://github.com/WrightonLabCSU/DRAM/issues/158#issuecomment-1240945837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPIZQIPM7AGKVGSFX2ZNFTV5IHZBANCNFSM5Q56QLWA. You are receiving this because you commented.

karinevilleneuve avatar Oct 17 '22 15:10 karinevilleneuve

Thanks for letting me know, sorry for the bug

rmFlynn avatar Oct 17 '22 15:10 rmFlynn