DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

Issue with DRAM-v distill - error with "auxilary score"

Open aprabhu90 opened this issue 4 years ago • 1 comments

Hi,

I ran the latest dramv on putative vMAGs obtained from both virsorter2 and vibrant, followed by dereplication at 0.99 with CD-HIT to get putative vMAGs. I was hoping to include vMAGs identified by Vibrant as well through this pipeline.

The latest dram(dramv_1.2.3) ran smoothly on putative vMAGs without a hitch but I seem to have issues with running distill. I tried on a small subset of samples with a single annotation file DRAM-v.py distill -i annotations.tsv -o Dramv_test and obtained the following error message -

0:00:00.031097: Retrieved database locations and descriptions
Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'auxiliary_score'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/dram_1.2.3/bin/DRAM-v.py", line 140, in <module>
    args.func(**args_dict)
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/mag_annotator/summarize_vgfs.py", line 235, in summarize_vgfs
    potential_amgs = filter_to_amgs(annotations, max_aux=max_auxiliary_score,
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/mag_annotator/summarize_vgfs.py", line 49, in filter_to_amgs
    vmap_aux_check = ('V' not in amg_flags) and ('M' in amg_flags) and (row['auxiliary_score'] <= max_aux) and \
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/series.py", line 853, in __getitem__
    return self._get_value(key)
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/series.py", line 961, in _get_value
    loc = self.index.get_loc(label)
  File "/srv/sw/miniconda3/envs/dram_1.2.3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'auxiliary_score'

Could this be a database issue as I can't seem to find anything close to "auxiliary score" column in the annotate output ? Have pasted below the annotation output if it helps.

fasta	scaffold	gene_position	start_position	end_position	strandedness	rank	kegg_id	kegg_hit	viral_id	viral_hit	viral_RBH	viral_identity	viral_bitScore	viral_eVal	pfam_hits	cazy_hits	vogdb_description	vogdb_categories	heme_regulatory_motif_count	vogdb_hits	peptidase_id	peptidase_family	peptidase_hit	peptidase_RBH	peptidase_identity	peptidase_bitScore	peptidase_eVal	is_transposon	amg_flags
NODE_1184_length_7202_cov_5.457825_1	NODE_1184_length_7202_cov_5.457825	NODE_1184_length_7202_cov_5.457825	1	2	343	1	D			YP_008320277.1	YP_008320277.1 endonuclease [Puniceispirillum phage HMO-2011]	False	0.8	188.0	4.164e-55	Phage endonuclease I [PF05367.11]		sp|P00641|ENDO_BPT7 Endonuclease I; Xp	Xp	0									False	F
NODE_1184_length_7202_cov_5.457825_2	NODE_1184_length_7202_cov_5.457825	NODE_1184_length_7202_cov_5.457825	2	347	727	1	E			YP_008320278.1	YP_008320278.1 hypothetical protein phage1322_16 [Puniceispirillum phage HMO-2011]	False	0.697	164.0	9.599e-47					0									False	F
NODE_1184_length_7202_cov_5.457825_3	NODE_1184_length_7202_cov_5.457825	NODE_1184_length_7202_cov_5.457825	3	724	1035	1	D			YP_008320279.1	YP_008320279.1 hypothetical protein phage1322_17 [Puniceispirillum phage HMO-2011]	False	0.921	143.0	8.5e-40	Protein of unknwon function (DUF3310) [PF11753.8]		sp|P07719|V17_BPT3 Gene 1.7 protein; Xh	Xh	0									False	F
NODE_1184_length_7202_cov_5.457825_4	NODE_1184_length_7202_cov_5.457825	NODE_1184_length_7202_cov_5.457825	4	1025	1183	1	E													0									False	F

Could you advise what is the alternate option for vMAGs obtained this way? Thanks, Apoorva

aprabhu90 avatar Jun 07 '21 05:06 aprabhu90

Hi Apoorva,

To get auxiliary scores you need to include the VIRSorter affi_contigs file. Without it we can't calculate the auxiliary score which we use to measure how confident a gene on a viral contig is actually viral and not a host gene erroneously included on the contig.

Mike

shafferm avatar Jun 16 '21 17:06 shafferm