murmurhash-net icon indicating copy to clipboard operation
murmurhash-net copied to clipboard

TransformBlock() behaves incorrectly when input data length is not a multiple of 16

Open itai opened this issue 3 years ago • 1 comments

When the length of the input data is not a multiple of 16, TransformBlock() does not behave correctly. In the following example, the same hash is generated for different byte sequences:

using System.Security.Cryptography;
using System.Text;
using Murmur;

{
    using HashAlgorithm hashAlgorithm = MurmurHash.Create128(preference: AlgorithmPreference.X64);
    TransformSingleByte(hashAlgorithm, 1);
    TransformSingleByte(hashAlgorithm, 1);
    TransformSingleByte(hashAlgorithm, 1);
    TransformSingleByte(hashAlgorithm, 1);
    hashAlgorithm.TransformFinalBlock(Array.Empty<byte>(), 0, 0);
    PrintBytes(hashAlgorithm.Hash!);
}

{
    using HashAlgorithm hashAlgorithm = MurmurHash.Create128(preference: AlgorithmPreference.X64);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 0);
    hashAlgorithm.TransformFinalBlock(Array.Empty<byte>(), 0, 0);
    PrintBytes(hashAlgorithm.Hash!);
}

{
    using HashAlgorithm hashAlgorithm = MurmurHash.Create128(preference: AlgorithmPreference.X64);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 1);
    TransformSingleByte(hashAlgorithm, 1);
    hashAlgorithm.TransformFinalBlock(Array.Empty<byte>(), 0, 0);
    PrintBytes(hashAlgorithm.Hash!);
}

{
    using HashAlgorithm hashAlgorithm = MurmurHash.Create128(preference: AlgorithmPreference.X64);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 1);
    TransformSingleByte(hashAlgorithm, 0);
    TransformSingleByte(hashAlgorithm, 1);
    hashAlgorithm.TransformFinalBlock(Array.Empty<byte>(), 0, 0);
    PrintBytes(hashAlgorithm.Hash!);
}

static void TransformSingleByte(
    HashAlgorithm hashAlgorithm,
    byte @byte) =>
    hashAlgorithm.TransformBlock(new[] {@byte}, 0, 1, null, 0);

static void PrintBytes(IEnumerable<byte> bytes)
{
    var stringBuilder = new StringBuilder();
    foreach (var @byte in bytes)
    {
        stringBuilder.AppendFormat("{0:X2}", @byte);
    }
    Console.WriteLine(stringBuilder.ToString());
}

Output:

BC764CD8DDF7A0CFF126F51C16239658
BC764CD8DDF7A0CFF126F51C16239658
BC764CD8DDF7A0CFF126F51C16239658
BC764CD8DDF7A0CFF126F51C16239658

itai avatar Sep 12 '22 11:09 itai

In the above example, all inputs to TransformSingleByte() are of size 1, which is less than the MurmurHash block size (16).

What happens is that HashCore() calls Body(), which calls Tail() with the input. Tail() does a calculation on the input and then XORs the result with H1 and H2. This means that when we call TransformBlock() twice with the same input, the effect cancels out (because X ^ Y ^ Y == X).

What HashCore() should have done is aggregate complete blocks and run the algorithm on them. The "tail" logic should only be used when HashFinal() is called.

itai avatar Sep 12 '22 11:09 itai