diamond icon indicating copy to clipboard operation
diamond copied to clipboard

partial header from different DBs

Open MyhFe opened this issue 2 years ago • 3 comments

Hi, I have a strange issue while running diamond on 2 different public DBs, ALLERGENS and BACTERIAL_TOXINS: TOXINS DB EXAMPLE:

btseq.1 gi|123659995|sp|Q4L5M5.1|GGI1_STAHJ RecName: Full=Antibacterial protein 1 homolog RecName: Full=Antibacterial protein 1 homolog MQKLAEAIAAAVQAGQDKDWGKMGTSIVGIVENGISVLGKIFGF

ALLERGENS DB EXAMPLE:

seq.1 ABL09307.1 allergen Aca s 13 [Acarus siro] MVQINGSYKLEKSDNFDAFLKELGLNFVTRNLAKSATPTVEVSVNGDSYTIKTASTLKNTEISFKLGEEF

when running diamond with: diamond blastp -p 16 -e 0.01 -q query.prt -d {DB} -o {OUT} --quiet --sensitive --query-cover 60 --subject-cover 0 --id 40 --outfmt 6 qseqid stitle -k 10

I get 2 outputs (1 for each DB) but the Subject ID column is different; the toxins DB produces the string :

gi|123659995|sp|Q4L5M5.1|GGI1_STAHJ RecName: Full=Antibacterial protein 1 homolog RecName: Full=Antibacterial protein 1 homolog

While the allergens DB produces the string:

seq.767 CAA55072.2 aldehyde dehydrogenase, allergen Cla h 10 [Cladosporium herbarum]

The latter has the name included while the first does not. What can I do to resolve the issue so that the format of the subject ID will be the same in every run against different DBs? Thanks a lot.

MyhFe avatar Feb 08 '23 11:02 MyhFe

What diamond version are you using? The current version should not cut the seqid.

bbuchfink avatar Feb 09 '23 09:02 bbuchfink

Thanks for replying, I'm using version 2.0.15 Still, this version returns the full header for one DB and a partial header for another, so I was hoping for a way to get the same result from this version.

MyhFe avatar Feb 09 '23 13:02 MyhFe

I can't reproduce the problem using diamond v2.0.15. I made a fasta file out of your sequence:

>btseq.1 gi|123659995|sp|Q4L5M5.1|GGI1_STAHJ RecName: Full=Antibacterial protein 1 homolog RecName: Full=Antibacterial protein 1 homolog
MQKLAEAIAAAVQAGQDKDWGKMGTSIVGIVENGISVLGKIFGF

Using the stitle field the seqid does not get cut for me.

bbuchfink avatar Feb 17 '23 15:02 bbuchfink