diamond
diamond copied to clipboard
The difference of result between diamond1 and diamond2 ?
Hi there:
In order to compare the difference between diamond1 and 2, I built DIAMOND database from the nr database separately. And use diamond1 and 2 for sequence alignment with the same set of nucleic acid sequences, the parameters are the same.
From the results, I found two main differences:
-
The subject sequence obtained from diamond1 has a version number, such as: XP_028630034.1; while the subject sequence obtained from diamond2 does not have a version number, such as: XP_028630034;
-
The result obtained by diamond2, the top one subject sequence is always from the UniProtKB/Swiss-Prot, while the result obtained by diamond1 is always the result of the NCBI Ref database.
diamond1 result:
Query Subject
ENSMUST00000001513 NP_080749.2
ENSMUST00000001513 XP_028630034.1
ENSMUST00000001513 NP_001233720.1
ENSMUST00000001513 XP_034374267.1
ENSMUST00000001513 NP_001020846.1
ENSMUST00000001513 XP_006970129.1
ENSMUST00000001513 XP_021482884.1
ENSMUST00000001513 XP_005074701.1
ENSMUST00000001513 OBS76548.1
ENSMUST00000001513 XP_004656888.1
diamond2 result:
Query Subject
ENSMUST00000001513 Q922F4
ENSMUST00000001513 XP_028630034
ENSMUST00000001513 AAZ14959
ENSMUST00000001513 XP_034374267
ENSMUST00000001513 AAH97977
ENSMUST00000001513 XP_006970129
ENSMUST00000001513 XP_021482884
ENSMUST00000001513 XP_005074701
ENSMUST00000001513 OBS76548
ENSMUST00000001513 XP_004656888
I read the paper "Sensitive protein alignments at tree-of-life scale using DIAMOND", it is mentioned that the benchmark database uses UniRef50 database information. I'm not sure if the second question I just mentioned is related to this?
Regarding these two questions, I hope to get your answers. thank you very much
wanjin hu
Did you use a BLAST database when running diamond v2? That would explain the different accessions. Note that for example NP_080749.2 and Q922F4 are the same proteins.
I found the reason, although I don't know why.
When the parameter --salltitles
is added to diamond2, the result is as follows,
Query Subject
ENSMUST00000001513 Q922F4
ENSMUST00000001513 XP_028630034
ENSMUST00000001513 AAZ14959
ENSMUST00000001513 XP_034374267
ENSMUST00000001513 AAH97977
When the parameter --salltitles
is not added to diamond2, the result is as follows, the result is same as the diamond1 result.
Query Subject
ENSMUST00000001513 NP_080749.2
ENSMUST00000001513 XP_028630034.1
ENSMUST00000001513 NP_001233720.1
ENSMUST00000001513 XP_034374267.1
ENSMUST00000001513 NP_001020846.1
Also, the parameter --salltitles
is not work for diamond1.