md5.lua icon indicating copy to clipboard operation
md5.lua copied to clipboard

Incorrect hashes on small strings

Open josefnpat opened this issue 6 years ago • 7 comments

I was double checking my indexes, and found a collision. Surpised, I did some more research:

nothanks@DESKTOP-616042V ~/repos/roguecraft-squadron
$ cat test1.txt
"Computer, what happened to the IEE Bishop?"

nothanks@DESKTOP-616042V ~/repos/roguecraft-squadron
$ cat test1.txt  | md5sum.exe
5f0d58eb72664a1e2848b191edfd1bee *-

nothanks@DESKTOP-616042V ~/repos/roguecraft-squadron
$ cat test2.txt
"Computer, what is Miho Takeda's current status?"

nothanks@DESKTOP-616042V ~/repos/roguecraft-squadron
$ cat test2.txt | md5sum.exe
03f3a365b8dfc21ddad179289a0138f6 *-


nothanks@DESKTOP-616042V ~/repos/roguecraft-squadron
$ cat test.lua
md5 = require"src.libs.md5"

print(md5.tohex('"Computer, what happened to the IEE Bishop?"'))
print(md5.tohex('"Computer, what is Miho Takeda\'s current status?"'))


nothanks@DESKTOP-616042V ~/repos/roguecraft-squadron
$ lua test.lua
22436f6d70757465722c207768617420
22436f6d70757465722c207768617420

It seems that I not only found a collision, but also that hashes for smaller strings are not correct.

Am I doing something wrong, or is this a bug?

josefnpat avatar Sep 20 '19 23:09 josefnpat

Heyo! It seems like you are not performing the sum on your data. Try the following:

md5 = require"src.libs.md5"

print(md5.tohex(md5.sum('"Computer, what happened to the IEE Bishop?"')))
print(md5.tohex(md5.sum('"Computer, what is Miho Takeda\'s current status?"')))

Or simpler:

md5 = require"src.libs.md5"

print(md5.sumhexa('"Computer, what happened to the IEE Bishop?"'))
print(md5.sumhexa('"Computer, what is Miho Takeda\'s current status?"'))

md5.tohex is just a nice shorthand for string.formatting the hex data, it does not perform the sum.

You either need md5.sum or md5.sumhexa for that, the former returns the MD5 sum as raw bytes in a string, while the later uses hexadecimal digits to represent the MD5 sum.

pablomayobre avatar Sep 22 '19 17:09 pablomayobre

Hello,

md5.lua is probably buggy... but in your testcase your data is not equal to your file content. You must add a new line (\n) at the end of data to reflect the file content.

assert( "5f0d58eb72664a1e2848b191edfd1bee" == md5sum('"Computer, what happened to the IEE Bishop?"\n'))
assert( "03f3a365b8dfc21ddad179289a0138f6" == md5sum('"Computer, what is Miho Takeda\'s current status?"\n'))

tst2005 avatar Sep 23 '19 13:09 tst2005

Why do you think md5.lua is probably buggy @tst2005?

And yeah, the newline is needed if you are comparing against md5sum.exe as seen here

pablomayobre avatar Sep 23 '19 14:09 pablomayobre

Because the md5sum of your both sample (with or without ending new line) are different.

Tested with a linux shell

$ echo -n '"Computer, what happened to the IEE Bishop?"' |md5sum
3f299a0664f45f2eb57e629a6c433326  -

$ echo -n '"Computer, what is Miho Takeda'\''s current status?"' |md5sum
1485157c7d8b58fc0cfee7a614ae507f  -

Tested with lua-lockbox :

local function md5sum(k)
        require("lockbox").ALLOW_INSECURE=true

        local Stream = require("lockbox.util.stream");
        local Digest = require("lockbox.digest.md5");

        return Digest()
                .update(Stream.fromString(k))
                .finish()
                .asHex()
end
assert( "3f299a0664f45f2eb57e629a6c433326" == md5sum('"Computer, what happened to the IEE Bishop?"'))
assert( "1485157c7d8b58fc0cfee7a614ae507f" == md5sum('"Computer, what is Miho Takeda\'s current status?"'))

assert( "5f0d58eb72664a1e2848b191edfd1bee" == md5sum('"Computer, what happened to the IEE Bishop?"\n'))
assert( "03f3a365b8dfc21ddad179289a0138f6" == md5sum('"Computer, what is Miho Takeda\'s current status?"\n'))

tst2005 avatar Sep 23 '19 15:09 tst2005

Ummm did you read my comment though?

The hashes @josefnpat posted are wrong because he didn't actually perform the sum on the string. That in addition to missing the newline will make it report wrong hashes.

So I encourage to test this yourself:

local function md5sum(k)
        local md5 = require("md5")
        return md5.sumhexa(k)
end
assert( "3f299a0664f45f2eb57e629a6c433326" == md5sum('"Computer, what happened to the IEE Bishop?"'))
assert( "1485157c7d8b58fc0cfee7a614ae507f" == md5sum('"Computer, what is Miho Takeda\'s current status?"'))

assert( "5f0d58eb72664a1e2848b191edfd1bee" == md5sum('"Computer, what happened to the IEE Bishop?"\n'))
assert( "03f3a365b8dfc21ddad179289a0138f6" == md5sum('"Computer, what is Miho Takeda\'s current status?"\n'))

pablomayobre avatar Sep 23 '19 21:09 pablomayobre

Ok, i got the release tomorrow, but I'll come back to this in a day or two to test. I'm suspecting it's the bloody newlines as well. Thanks

josefnpat avatar Sep 23 '19 23:09 josefnpat

I can confirm that the hash is fine for these cases. I ran that test myself and can confirm that the missing newline changed the result as intended. I also used a third party conversion tool to make sure that all four results are correct for their input strings.

Grocel avatar Oct 01 '19 21:10 Grocel