DRAM
DRAM copied to clipboard
DRAM-v annotations.tsv columns change
Hello, I have used DRAM-v annotate several times (same settings), but I noticed that the columns in the annotations.tsv file change sometimes. The position of the columns 'pfam_hits', 'cazy_hits', vogdb_description', 'vogdb_categories' and 'heme_regulatory_motif_count' can change. They are either reported after peptidase_eVal, or after viral_eVal. Moreover, in one of the annotations.tsv file an extra (empty) column was added with the header 'vogdb_hits'. This prevented the quick merging of several annotations.tsv files.
Sounds like you found a fun bug! DRAM only generates a column if there are at least one hit to the database in the FASTA's being annotated. I always merge based on column names using pandas. You can do this with this quick script.
import pandas as pd
from glob import glob
merged_annotations = pd.concat([pd.read_csv(annotation_path, sep='\t', index_col=0)
for annotation_path in glob('/path/to/annotations/*.tsv')])
merged_annotations.to_csv('merged_annotations.tsv', sep='\t')
I will add an update in a future version to guarantee all columns are generated even if there are no hits.
This should be fixed!