hash-bench icon indicating copy to clipboard operation
hash-bench copied to clipboard

Add t1ha to comparison

Open erthink opened this issue 7 years ago • 9 comments

Please pay attention to https://github.com/PositiveTechnologies/t1ha Regards.

erthink avatar Jul 29 '17 12:07 erthink

@leo-yuriev , do you know of an existing Java implementation of t1ha?

vlsi avatar Jul 29 '17 13:07 vlsi

@vlsi, no. Seems t1ha is too hardly for pure Java, because internally it uses extended multiplication of unsigned int64 with 128 bit result. e.g. https://docs.microsoft.com/en-us/cpp/intrinsics/umul128

erthink avatar Jul 29 '17 13:07 erthink

Well, OpenJDK knows of imulq (see https://github.com/dmlloyd/openjdk/blob/a5de9e06dda1cfac94dc7f65a47252895e27013e/hotspot/src/cpu/x86/vm/x86_64.ad#L7980 ), however it does that just when it optimizes "division by constant" into "multiply and shift" (see https://github.com/dmlloyd/openjdk/blob/77d7cb3ad9efc4edeaae7cc46e3b4a98ea617679/hotspot/src/share/vm/opto/divnode.cpp#L412-L423 )

Do you think it makes sense to use t1ha with multiplications being emulated? Is it something you call t1ha_32le ?

vlsi avatar Jul 29 '17 14:07 vlsi

  1. t1ha needs 128-bit result of multiplication, therefore imulq is unsuitable, but mulq required.
  2. Java don't provides an unsigned integer types or any kind of wide multiplication, therefore at least four 64-bit integer multiplications and additions are required for https://github.com/PositiveTechnologies/t1ha/blob/25f790b6723fa0035f2b7c0ae6887e5e9ba64264/src/t1ha_bits.h#L452-L462.

So, I assume that JNI overhead still be less.

erthink avatar Jul 29 '17 15:07 erthink

Java don't provides an unsigned integer types or any kind of wide multiplication, therefore at least four 64-bit integer multiplications and additions are required for

There's Math.multiplyHigh(long, long) since Java9, however it is not yet optimized by JIT compiler.

vlsi avatar Jul 29 '17 16:07 vlsi

There are JNI impls in this comparison already, so it's legitimate. However it will lose competition on small input sizes, because JNI overhead is ~ 50 ns as far as I know. Small inputs are in practice the most important, except the case when a hash function is used as checksum function.

leventov avatar Jul 29 '17 17:07 leventov

Well, Critical JNI seems to be able to sum 16 bytes at 60'000 ops/ms that translates to 1'000'000/60'000 == 16.7ns/op: https://stackoverflow.com/a/36309652/1261287

vlsi avatar Jul 29 '17 18:07 vlsi

There's Math.multiplyHigh(long, long) since Java9, however it is not yet optimized by JIT compiler.

Current t1ha headliner is t1ha1_le(), and it requires 64x64-to-128 unsigned multiplication. For future versions, e.g. t1ha2(), t1ha3() etc, I will take in account Java's limitations. But now seems no reasonable solutions except JNI or NCP.

erthink avatar Jul 29 '17 18:07 erthink

Well, Critical JNI seems to be able...

Yes, Critical Natives give a chanse for t1ha1().

JDK-7013347

JNIEXPORT jlong JNICALL
JavaCritical_com_package_t1ha_t1ha1_le(jint length, jbyte* buf, jlong seed) {
    return t1ha1_le(buf, length, seed);
}

erthink avatar Jul 29 '17 19:07 erthink