DRAM
DRAM copied to clipboard
Issue with DRAM-v distill - error with "auxilary score"
Hi,
I ran the latest dramv on putative vMAGs obtained from both virsorter2 and vibrant, followed by dereplication at 0.99 with CD-HIT to get putative vMAGs. I was hoping to include vMAGs identified by Vibrant as well through this pipeline.
The latest dram(dramv_1.2.3) ran smoothly on putative vMAGs without a hitch but I seem to have issues with running distill. I tried on a small subset of samples with a single annotation file DRAM-v.py distill -i annotations.tsv -o Dramv_test
and obtained the following error message -
0:00:00.031097: Retrieved database locations and descriptions
Traceback (most recent call last):
File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'auxiliary_score'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/srv/sw/miniconda3/envs/dram_1.2.3/bin/DRAM-v.py", line 140, in <module>
args.func(**args_dict)
File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/mag_annotator/summarize_vgfs.py", line 235, in summarize_vgfs
potential_amgs = filter_to_amgs(annotations, max_aux=max_auxiliary_score,
File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/mag_annotator/summarize_vgfs.py", line 49, in filter_to_amgs
vmap_aux_check = ('V' not in amg_flags) and ('M' in amg_flags) and (row['auxiliary_score'] <= max_aux) and \
File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/series.py", line 853, in __getitem__
return self._get_value(key)
File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/series.py", line 961, in _get_value
loc = self.index.get_loc(label)
File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'auxiliary_score'
Could this be a database issue as I can't seem to find anything close to "auxiliary score" column in the annotate output ? Have pasted below the annotation output if it helps.
fasta scaffold gene_position start_position end_position strandedness rank kegg_id kegg_hit viral_id viral_hit viral_RBH viral_identity viral_bitScore viral_eVal pfam_hits cazy_hits vogdb_description vogdb_categories heme_regulatory_motif_count vogdb_hits peptidase_id peptidase_family peptidase_hit peptidase_RBH peptidase_identity peptidase_bitScore peptidase_eVal is_transposon amg_flags
NODE_1184_length_7202_cov_5.457825_1 NODE_1184_length_7202_cov_5.457825 NODE_1184_length_7202_cov_5.457825 1 2 343 1 D YP_008320277.1 YP_008320277.1 endonuclease [Puniceispirillum phage HMO-2011] False 0.8 188.0 4.164e-55 Phage endonuclease I [PF05367.11] sp|P00641|ENDO_BPT7 Endonuclease I; Xp Xp 0 False F
NODE_1184_length_7202_cov_5.457825_2 NODE_1184_length_7202_cov_5.457825 NODE_1184_length_7202_cov_5.457825 2 347 727 1 E YP_008320278.1 YP_008320278.1 hypothetical protein phage1322_16 [Puniceispirillum phage HMO-2011] False 0.697 164.0 9.599e-47 0 False F
NODE_1184_length_7202_cov_5.457825_3 NODE_1184_length_7202_cov_5.457825 NODE_1184_length_7202_cov_5.457825 3 724 1035 1 D YP_008320279.1 YP_008320279.1 hypothetical protein phage1322_17 [Puniceispirillum phage HMO-2011] False 0.921 143.0 8.5e-40 Protein of unknwon function (DUF3310) [PF11753.8] sp|P07719|V17_BPT3 Gene 1.7 protein; Xh Xh 0 False F
NODE_1184_length_7202_cov_5.457825_4 NODE_1184_length_7202_cov_5.457825 NODE_1184_length_7202_cov_5.457825 4 1025 1183 1 E 0 False F
Could you advise what is the alternate option for vMAGs obtained this way? Thanks, Apoorva
Hi Apoorva,
To get auxiliary scores you need to include the VIRSorter affi_contigs file. Without it we can't calculate the auxiliary score which we use to measure how confident a gene on a viral contig is actually viral and not a host gene erroneously included on the contig.
Mike