ssdeep.js icon indicating copy to clipboard operation
ssdeep.js copied to clipboard

Different similiarity outputs between libraries

Open WJDigby opened this issue 6 years ago • 2 comments

Hello,

Thank you for providing this code.

This library outputs different "similarity" ratings when comparing two hashes than other ssdeep libraries / examples:

Python3 ssdeep library and the same Eicar strings used in the readme:

>>> e1 = ssdeep.hash("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*")
>>> e2 = ssdeep.hash("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-THREATPINCH-ANTIVIRUS-TEST-FILE!$H+H*")
>>> e1
'3:a+JraNvsgzsVqSwHq9:tJuOgzsko'
>>> e2
'3:a+JraNvsg7QhyqzWwHq9:tJuOg7Q4Wo'
>>> ssdeep.compare(e1, e2)
18

JavaScript ssdeep.js library:

>> e1 = ssdeep.digest("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*")
"3:a+JraNvsgzsVqSwHq9:tJuOgzsko"
>> e2 = ssdeep.digest("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-THREATPINCH-ANTIVIRUS-TEST-FILE!$H+H*")
"3:a+JraNvsg7QhyqzWwHq9:tJuOg7Q4Wo"
>> ssdeep.similarity(e1, e2)
70

Both libraries produce identical hashes.

The ssdeep online demo also produces a value of 18 when comparing the two Eicar strings:

image

​Is this intended behavior? Is there a "weight" or some metric that can adjust the grading scale of the comparison?

WJDigby avatar May 26 '19 20:05 WJDigby

I noticed the same. Any idea why this happens @cloudtracer ?

gehaxelt avatar Jun 10 '20 09:06 gehaxelt

I noticed this library has a few bugs in its comparison algorithm and is also inefficient since it runs synchronously. I created a project fast-ssdeep that binds to the ssdeep C API to provide a performant and compliant implementation.

Not sure if this repository is maintained at all. If it isn't, it would be nice if the maintainer could mention my project.

memcorrupt avatar Aug 17 '24 00:08 memcorrupt