partial header from different DBs
Hi, I have a strange issue while running diamond on 2 different public DBs, ALLERGENS and BACTERIAL_TOXINS: TOXINS DB EXAMPLE:
btseq.1 gi|123659995|sp|Q4L5M5.1|GGI1_STAHJ RecName: Full=Antibacterial protein 1 homolog RecName: Full=Antibacterial protein 1 homolog MQKLAEAIAAAVQAGQDKDWGKMGTSIVGIVENGISVLGKIFGF
ALLERGENS DB EXAMPLE:
seq.1 ABL09307.1 allergen Aca s 13 [Acarus siro] MVQINGSYKLEKSDNFDAFLKELGLNFVTRNLAKSATPTVEVSVNGDSYTIKTASTLKNTEISFKLGEEF
when running diamond with:
diamond blastp -p 16 -e 0.01 -q query.prt -d {DB} -o {OUT} --quiet --sensitive --query-cover 60 --subject-cover 0 --id 40 --outfmt 6 qseqid stitle -k 10
I get 2 outputs (1 for each DB) but the Subject ID column is different; the toxins DB produces the string :
gi|123659995|sp|Q4L5M5.1|GGI1_STAHJ RecName: Full=Antibacterial protein 1 homolog RecName: Full=Antibacterial protein 1 homolog
While the allergens DB produces the string:
seq.767 CAA55072.2 aldehyde dehydrogenase, allergen Cla h 10 [Cladosporium herbarum]
The latter has the name included while the first does not. What can I do to resolve the issue so that the format of the subject ID will be the same in every run against different DBs? Thanks a lot.
What diamond version are you using? The current version should not cut the seqid.
Thanks for replying, I'm using version 2.0.15 Still, this version returns the full header for one DB and a partial header for another, so I was hoping for a way to get the same result from this version.
I can't reproduce the problem using diamond v2.0.15. I made a fasta file out of your sequence:
>btseq.1 gi|123659995|sp|Q4L5M5.1|GGI1_STAHJ RecName: Full=Antibacterial protein 1 homolog RecName: Full=Antibacterial protein 1 homolog
MQKLAEAIAAAVQAGQDKDWGKMGTSIVGIVENGISVLGKIFGF
Using the stitle field the seqid does not get cut for me.