DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

DRAM-v annotations.tsv columns change

Open cdebuck opened this issue 4 years ago • 1 comments

Hello, I have used DRAM-v annotate several times (same settings), but I noticed that the columns in the annotations.tsv file change sometimes. The position of the columns 'pfam_hits', 'cazy_hits', vogdb_description', 'vogdb_categories' and 'heme_regulatory_motif_count' can change. They are either reported after peptidase_eVal, or after viral_eVal. Moreover, in one of the annotations.tsv file an extra (empty) column was added with the header 'vogdb_hits'. This prevented the quick merging of several annotations.tsv files.

cdebuck avatar Mar 05 '21 13:03 cdebuck

Sounds like you found a fun bug! DRAM only generates a column if there are at least one hit to the database in the FASTA's being annotated. I always merge based on column names using pandas. You can do this with this quick script.

import pandas as pd
from glob import glob

merged_annotations = pd.concat([pd.read_csv(annotation_path, sep='\t', index_col=0)
                                for annotation_path in glob('/path/to/annotations/*.tsv')])
merged_annotations.to_csv('merged_annotations.tsv', sep='\t')

I will add an update in a future version to guarantee all columns are generated even if there are no hits.

shafferm avatar Mar 09 '21 19:03 shafferm

This should be fixed!

rmFlynn avatar Aug 31 '22 18:08 rmFlynn