diamond icon indicating copy to clipboard operation
diamond copied to clipboard

The difference of result between diamond1 and diamond2 ?

Open wanjinhu opened this issue 3 years ago • 2 comments

Hi there:

In order to compare the difference between diamond1 and 2, I built DIAMOND database from the nr database separately. And use diamond1 and 2 for sequence alignment with the same set of nucleic acid sequences, the parameters are the same.

From the results, I found two main differences:

  1. The subject sequence obtained from diamond1 has a version number, such as: XP_028630034.1; while the subject sequence obtained from diamond2 does not have a version number, such as: XP_028630034;

  2. The result obtained by diamond2, the top one subject sequence is always from the UniProtKB/Swiss-Prot, while the result obtained by diamond1 is always the result of the NCBI Ref database.

diamond1 result:

Query Subject
ENSMUST00000001513	NP_080749.2
ENSMUST00000001513	XP_028630034.1
ENSMUST00000001513	NP_001233720.1
ENSMUST00000001513	XP_034374267.1
ENSMUST00000001513	NP_001020846.1
ENSMUST00000001513	XP_006970129.1
ENSMUST00000001513	XP_021482884.1
ENSMUST00000001513	XP_005074701.1
ENSMUST00000001513	OBS76548.1
ENSMUST00000001513	XP_004656888.1

diamond2 result:

Query Subject
ENSMUST00000001513	Q922F4
ENSMUST00000001513	XP_028630034
ENSMUST00000001513	AAZ14959
ENSMUST00000001513	XP_034374267
ENSMUST00000001513	AAH97977
ENSMUST00000001513	XP_006970129
ENSMUST00000001513	XP_021482884
ENSMUST00000001513	XP_005074701
ENSMUST00000001513	OBS76548
ENSMUST00000001513	XP_004656888

I read the paper "Sensitive protein alignments at tree-of-life scale using DIAMOND", it is mentioned that the benchmark database uses UniRef50 database information. I'm not sure if the second question I just mentioned is related to this?

Regarding these two questions, I hope to get your answers. thank you very much

wanjin hu

wanjinhu avatar Sep 22 '21 03:09 wanjinhu

Did you use a BLAST database when running diamond v2? That would explain the different accessions. Note that for example NP_080749.2 and Q922F4 are the same proteins.

bbuchfink avatar Sep 28 '21 13:09 bbuchfink

I found the reason, although I don't know why.

When the parameter --salltitles is added to diamond2, the result is as follows,

Query Subject
ENSMUST00000001513	Q922F4
ENSMUST00000001513	XP_028630034
ENSMUST00000001513	AAZ14959
ENSMUST00000001513	XP_034374267
ENSMUST00000001513	AAH97977

When the parameter --salltitles is not added to diamond2, the result is as follows, the result is same as the diamond1 result.

Query Subject
ENSMUST00000001513	NP_080749.2
ENSMUST00000001513	XP_028630034.1
ENSMUST00000001513	NP_001233720.1
ENSMUST00000001513	XP_034374267.1
ENSMUST00000001513	NP_001020846.1

Also, the parameter --salltitles is not work for diamond1.

wanjinhu avatar Oct 09 '21 05:10 wanjinhu