phobos Add xxhash digests to phobos library

Add XXHash digests to phobos library

This patch is motivated by the need to have a much faster alternative to the slow MD5 or SHAx cryptographic digests. XXHash by itself is no cryptographic hash, but good enough to create suitable checksums and is used e.g. with ZSTD compression.

The patch adds the four digest variants of the XXHash project (see github.com:Cyan4973/xxHash.git) ranging from 32 to 128 bits bits digests. Both variants for the 64 bit digest (XXH and XXH3) are implemented.

The C sources are ported to D and currently only implements the 'scalar' mode of operation. The further optimisations (SIMD) are added later. So this ported XXHash digests are probably slower than the original C sources. There is also an experimental version using the embedded C sources with optimisations, but initial reviewers rejected the idea of embedded more C sources into Phobos, and argued for porting the C code to D.

The digests results were compared against the outputs of the xxhsum commandline utility manually and match the outputs of the command line utility.

There are unittests for all variants of XXHash (taken from MD5 sources and digest values were updated accordingly).

The digests can be used with the template API:

XXH_32 hash0;
hash0.start();
hash0.put(cast(ubyte) 0);
auto result = hash0.finish();

XXH_64 hash1;
hash1.start();
hash1.put(cast(ubyte) 0);
auto result = hash1.finish();

XXH3_64 hash2;
hash2.start();
hash2.put(cast(ubyte) 0);
auto result = hash2.finish();

XXH3_128 hash3;
hash3.start();
hash3.put(cast(ubyte) 0);
auto result = hash3.finish();

as well as with the OOP versions:

auto xxh0 = new XXH32Digest();
ubyte[] hash0 = xxh0.digest("abc");

auto xxh1 = new XXH64Digest();
ubyte[] hash1 = xxh1.digest("abc");

auto xxh1 = new XXH3_64Digest();
ubyte[] hash2 = xxh2.digest("abc");

auto xxh2 = new XXH128Digest();
ubyte[] hash3 = xxh3.digest("abc");

There are some extra functions in the ported code (secrets, seeds), which are not yet implemented publically. These functions will be added later.

Benchmarks: https://gitlab.vahanus.net/dlang/hashbench.git

$ dub run --compiler=dmd-beta --
Performing "debug" build using dmd-beta for x86_64.
hashbench ~master: building configuration "application"...
Running pre-build commands...
Linking...
Running hashbench 
A benchmark utility for the phobos digests.

Build is a0026c2

Time Limit per test is 180
Phobos Digest Test 'test_md5.test_md5' (compiled on May 24 2022)
Average 147.586716 MB/s
Phobos Digest Test 'test_xxh32.test_xxh32' (compiled on May 24 2022)
Average 961.077881 MB/s
Phobos Digest Test 'test_xxh64.test_xxh64' (compiled on May 24 2022)
Average 1938.343140 MB/s
Phobos Digest Test 'test_xxh3_64.test_xxh3_64' (compiled on May 24 2022)
Average 1084.588257 MB/s
Phobos Digest Test 'test_xxh3_128.test_xxh3_128' (compiled on May 24 2022)
Average 1052.687256 MB/s
Phobos Digest Test 'test_murmur32.test_mm32' (compiled on May 24 2022)
Average 334.169739 MB/s
Phobos Digest Test 'test_murmur128.test_mm128' (compiled on May 24 2022)
Average 774.752014 MB/s
Total test duration: 28 secs, 388 ms, 258 μs, and 9 hnsecs

Optimized

$ dub run --compiler=dmd-beta -b=release --
Performing "release" build using dmd-beta for x86_64.
hashbench 0.0.0: building configuration "application"...
Running pre-build commands...
Linking...
Running hashbench 
A benchmark utility for the phobos digests.

Build is v0.0.0-0-gd5526bf

Time Limit per test is 180
Phobos Digest Test 'test_md5.test_md5' (compiled on May 24 2022)
Average 162.910431 MB/s
Phobos Digest Test 'test_xxh32.test_xxh32' (compiled on May 24 2022)
Average 945.801086 MB/s
Phobos Digest Test 'test_xxh64.test_xxh64' (compiled on May 24 2022)
Average 1801.930542 MB/s
Phobos Digest Test 'test_xxh3_64.test_xxh3_64' (compiled on May 24 2022)
Average 1074.387695 MB/s
Phobos Digest Test 'test_xxh3_128.test_xxh3_128' (compiled on May 24 2022)
Average 1072.683472 MB/s
Phobos Digest Test 'test_murmur32.test_mm32' (compiled on May 24 2022)
Average 4091.976562 MB/s
Phobos Digest Test 'test_murmur128.test_mm128' (compiled on May 24 2022)
Average 5495.587402 MB/s
Total test duration: 20 secs, 239 ms, 25 μs, and 1 hnsec

Reference benchmark from XXH C Implementation: Note: XXH3 uses vector code

$ xxhsum -b
xxhsum 0.8.1 by Yann Collet 
compiled as 64-bit x86_64 autoVec little endian with GCC 11.2.0 
Sample of 100 KB...        
 1#XXH32                         :     102400 ->    93288 it/s ( 9110.2 MB/s)   
 3#XXH64                         :     102400 ->   135630 it/s (13245.2 MB/s)   
 5#XXH3_64b                      :     102400 ->   594488 it/s (58055.5 MB/s)   
11#XXH128                        :     102400 ->   591542 it/s (57767.8 MB/s)

Obviously the D variant of XXH does NOT optimize well at the moment. Especially the XXH3 variants could be much faster using vector commands

For comparition the same benchmarks with embeeded XXH C sources into Phobos: Note: Vector Code disabled because of problems on Darwin! XXH3 is scalar code.

$ dub run --compiler=dmd-beta -b release -- 
Performing "release" build using dmd-beta for x86_64.
hashbench 0.0.0: target for configuration "application" is up to date.
To force a rebuild of up-to-date targets, run again with --force.
Running hashbench 
A benchmark utility for the phobos digests.

Build is v0.0.0-0-gd5526bf

Time Limit per test is 180
Phobos Digest Test 'test_md5.test_md5' (compiled on May 25 2022)
Average 159.477325 MB/s
Phobos Digest Test 'test_xxh32.test_xxh32' (compiled on May 25 2022)
Average 8726.523438 MB/s
Phobos Digest Test 'test_xxh64.test_xxh64' (compiled on May 25 2022)
Average 17062.607422 MB/s
Phobos Digest Test 'test_xxh3_64.test_xxh3_64' (compiled on May 25 2022)
Average 18957.285156 MB/s
Phobos Digest Test 'test_xxh3_128.test_xxh3_128' (compiled on May 25 2022)
Average 18937.341797 MB/s
Phobos Digest Test 'test_murmur32.test_mm32' (compiled on May 25 2022)
Average 4002.023682 MB/s
Phobos Digest Test 'test_murmur128.test_mm128' (compiled on May 25 2022)
Average 5361.663574 MB/s
Total test duration: 14 secs, 740 ms, and 589 μs

May 13 '22 09:05 cschlote

Thanks for your pull request and interest in making D better, @cschlote! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8458"

May 13 '22 09:05 dlang-bot

This patch is motivated by the need to have a much faster alternative to the slow MD5 or SHAx cryptographic digests. XXHash by itself is no cryptographic hash, but good enough to create suitable checksums

For the record, we already have murmurhash which was also designed with these goals.

May 13 '22 09:05 CyberShadow

@CyberShadow XXHash seems to be slightly faster than Murmur. At least that is stated on https://github.com/Cyan4973/xxHash/README.md. There is also a SSE2 variant, which is much faster. So it won't hurt to have XXHash available in phobos.

May 13 '22 10:05 cschlote

True, but is "slightly faster" a sufficient reason to add it and commit to supporting it in the language's standard library? New hash functions are being invented constantly, so it seems very likely that the next hash function (which in turn will be slightly faster or better in some other way than xxhash) will inevitably appear and gain popularity.

One way we've approached questions such as these in the past is to first have the functionality published as a Dub package. If the Dub package becomes notably popular and its addition to the standard library is seen as worthwhile, then it is moved here.

Another way we could approach this is to measure the feature's popularity by its availability in other programming languages and their standard libraries, as this also indicates how likely D users will expect that the feature will be available in D when moving from another language.

May 13 '22 10:05 CyberShadow

"slight faster" means 2.5 to 5 times faster than Murmur. XXHash is included in ZSTD, which itself gains popularity for being fast and efficient. We replaced MD5 with XXHash at work, because the cryptographic properties of MD5 are not needed (and are also already compromissed). We did some benchmarks with the alternatives and it turned out to out-perform all those alternatives on all tested architectures (x86_64, arm32, arm64/aarch64). And honestly - nobody at work even mentioned Murmur as some alternative to MD5. So it can't be that popular. But the are xxhash modules for go, node, rust and python. Therefore I would like to have it with D as well.

I already use this code for our local tools. Nevertheless it is a nice exercise to contribute this code and fix all these issues dicovered by the CI pipelines here.

May 13 '22 11:05 cschlote

But the are xxhash modules for go, node, rust and python. Therefore I would like to have it with D as well.

Alright, but why not publish this as a Dub package and go straight to submitting this to the standard library?

Actually, I see that someone already made a Dub package five years ago: https://code.dlang.org/packages/xxhash

It doesn't seem to integrate with std.digest but it does implement the algorithm.

May 13 '22 12:05 CyberShadow

Yes, I know, and the existing Dub xxhash package implements the old old 32 bit algorithm. Meanwhile, there are two new 64 bit modes variants, and a 128bit variant. The new variants also support a secret and seed parameter. There is also special assembly code for maximum performance and SSE2 support. My goal is to add these additional features as well, so taht everything provided by the current xxhash library 0.8.1 is also directly available in D.

As I can't see a good reason, why a maintained C library should be rewritten in D I opted for the D wrapper approach. The wrapper are directly derived from the sources in std.digest.* .

As soon as I got all pipelines and checks from PR working, I will add the remaining features for XXHash to the wrapper code.

Making a stand-alone dub package is a little bit more complicated, because I have to reinvent the wheel again to compile the C sources for many different platforms. Applying the code directly to phobos solved a lot of these problems, which is the main reason, why I did it that way.

May 13 '22 12:05 cschlote

Yes, I know, and the existing Dub xxhash package implements the old old 32 bit algorithm. Meanwhile, there are two new 64 bit modes variants, and a 128bit variant. The new variants also support a secret and seed parameter.

Well, to be fair this is a little concerning that the algorithm keeps evolving. The main concern is that the version implemented in Phobos might not be the version that a user expects - whether they expect an older version like the one in the Dub package, or perhaps a newer version that will appear in the future.

Making a stand-alone dub package is a little bit more complicated, because I have to reinvent the wheel again to compile the C sources for many different platforms. Applying the code directly to phobos solved a lot of these problems, which is the main reason, why I did it that way.

Understandable, but I believe we actually want to reduce the amount of C code in Phobos. There have been proposals to remove std.zlib / std.zip from Phobos, and even code which links to external C code (std.curl).

Adding C code is especially noteworthy because it creates a maintenance burden for the future: any time we need to touch our build machinery, or port D to a new platform/architecture, will require doing some work for the C code we're shipping as well. Such additions would thus have to be well-justified. @ibuclaw could provide some feedback here.

In the end, such an addition would need to be approved by a BDFL (Walter Bright or Atila Neves).

May 13 '22 14:05 CyberShadow

This patch is motivated by the need to have a much faster alternative to the slow MD5 or SHAx cryptographic digests. XXHash by itself is no cryptographic hash, but good enough to create suitable checksums

For the record, we already have murmurhash which was also designed with these goals.

MurmurHash is very different from xxHash. xxHash is a new iteration of non-cryptographic hashes with very interesting results, in favour of a replacement for MurmurHash on 32-bit and 64-bit low cuts. AFAIK, we use a MurmurHash3 implementation for TypeInfo.toHash() ? We should really consider changing it to a faster algorithm. xxHash is greatly faster and still maintains very good quality, for both small and large data, hence suitable for zstd compression. Sliced xxh3 variants (xxh3 64-bit sliced to represent a 32-bit hash) can be considered as they provide better throughput in some systems.

That said, I think we should consider this addition and possibly discuss further on adoption for druntime hashOf implementation: https://github.com/dlang/druntime/blob/master/src/core/internal/hash.d . I'm not sure if this is something even considerable since it will be kind of a breaking change, from a deterministic/reproducibility perspective.

May 13 '22 15:05 ljmf00

https://github.com/dlang/druntime/blob/master/src/core/internal/hash.d . I'm not sure if this is something even considerable since it will be kind of a breaking change, from a deterministic/reproducibility perspective.

That should be fine, and we've done this before. The hash as well as ordering of objects in an AA is implementation defined. The hash/order is already different on different architectures.

May 13 '22 16:05 CyberShadow

@ljmf00 Thanks for your thoughts. We had similiar discussions and considerations at work. XXHash is amazingly fast, but still reliable enough to checksum data. For stuff like toHash(), I think this should be black box for the user. The should assume that a good hash algo is in use, but nothing else. If you need to rely on the hash, you must implement your own toHash() function, I think.

@CyberShadow I'm aware of the maintainance burden of having C files bundles with phobos. On the other hand I would maintain them in case of inclusion, on the other hand we can participate from future improvements and bugfixes in the C codebase from upstream. At the moment I use the C files as provided from upstream without any change, yet.

Currently there is some issue on darwin targets with the inline assembly and Xcode, but this shouldn't need big fixes. Unfortunately I do not have any Mac computer or Xcode, so I can only check the build pipelines for this project, and guess/google possible solutions.

May 13 '22 16:05 cschlote

That should be fine, and we've done this before. The hash as well as ordering of objects in an AA is implementation defined. The hash/order is already different on different architectures.

Ok, makes sense.

Regarding to including the C headers/source, I wouldn't include the whole thing. I'm not sure how good is the vectorization passes to transform something like this https://llvm.org/doxygen/xxhash_8cpp_source.html to efficient SSE code.

How hard would be to translate this to D?

From my understanding by reading some source code, it seems that a lot of code is macros to support loads of variants of xxhash, loads of variants of C and loads of systems/platforms we don't currently support.

There is also a lot of utility code to generically support SIMD/intrinsic ops, and we already have most of it.

Hopefully with our __simd and __vector API we can significantly simplify this, although I'm not very deep into that topic to evaluate the effort.

May 13 '22 16:05 ljmf00

@ljmf00 I think, it shouldn't be too hard to create a D port of the C sources. With all the unittests already implemented and checksums compared against the reference C code and the commandline xxhsum tool, it should be possible to detect regressions.

The XXHash implementations shouldn't change at all, as they are released and used to verify data streams in ZSTD and for other tools. Maybe new algorithms are added later. But for the existing C implemenation changes to the algorithm are impossible, because that would break anything depending on it.

So I'm going to replace the C sources with some hand-crafted D code.

May 13 '22 17:05 cschlote

A further reason not to include another C lib in Phobos is that distro package maintainers don't like that duplication; they have their own zlib and libxxhash packages. E.g., the Debian packages exclude the zlib sources from Phobos AFAIK (or at least used to do that).

May 14 '22 15:05 kinke

A further reason not to include another C lib in Phobos is that distro package maintainers don't like that duplication; they have their own zlib and libxxhash packages. E.g., the Debian packages exclude the zlib sources from Phobos AFAIK (or at least used to do that).

Agree. I never understood the reason behind that, honestly. We should use pkg-config or similar stuff to manage those dependencies.

May 14 '22 22:05 ljmf00

We should use pkg-config or similar stuff to manage those dependencies.

This is a good option on Linux but is not generally available on some other platforms that D aims to support.

May 14 '22 22:05 CyberShadow

So porting the routines to D seems to be the best option, as it solves the problems with embedding C sources and get it working on all platforms.

On the other hand, the C sources are highly optimized. I got the XXH32 and XXH64 already ported. There are still some architecture speed optimizations, which might be worthy to be ported as well.

At the moment I' working on porting the newer XXH3 variants, which also support vector units. I will first port the plain C versions, and then I need to think on how to include _simd or whatever to have enable high speed support for them.

May 15 '22 06:05 cschlote

Latest problem with XXH3 variants: They needs 128bit arthmetics. With current C compilers this is no problem, as they already have 128 bit code available and builtin.

In D this is still WIP - current code only works with dmd-beta . The types cent/ucent are not implemented, and std.int128 seems to have problems casts from Int128 to ulong.

So I need to revert to the portable 32/64 bit code. Maybe this can be fixed later.

May 15 '22 07:05 cschlote

Ok, ported the C code to D. And Unittests still pass ;-) The result is still pretty much C-ish source code, which needs further cleanups and refactoring to match D coding style.

The current state of this PR is retained as branch 'add_xxhash_digest_C_version' in my repo fork.

I will rebase and cleanup the commit stack of this PR, remove C code and related changes to Makefiles. Then I will refactor the code, until it passes the pipelines.

Thanks for encouraging me to do this tedious porting work... ;-)

May 15 '22 17:05 cschlote

I think we should first make all the interfaces private and only export the digest public API and make it public when needed? Also, I've seen a lot of @trusted for unsafe interfaces. I assume you are doing this to track it, right?

May 15 '22 20:05 ljmf00

First, thanks for the review and comments. It is always fun to see, that people do all that kind of work in their spare time. I would really like to see that at work. Unfortunately most of my colleagues think, that reviews and documentation is just superfluious non-sense. Not to talk about unittests. It was really helpful to have all these unittests already in place and working, while I adapted the C sources to D.

And I just see: The pipelines completed succesfully ;-)

I will apply the suggested changes asap and resolve comments above when done.

May 15 '22 20:05 cschlote

The next steps are cleanups on documentation and code style. Maybe/Probably also some enhancements to unitests. After that, the PR should be ready to leave draft state.

May 18 '22 12:05 cschlote

I did some benchmarking with the current beta DMD. The D variant of XXH doesn't optimize well at the moment. There is no observable improvement with optimizing code. I wonder, where the problem is. In unoptimized mode, XXH is faster than MurmurHash.

So, if speed is wanted, the C library is outperforming any current D digest.

May 24 '22 15:05 cschlote

I added a separate dub package to the registry: https://code.dlang.org/packages/xxhash3

The current code compiles fine with DMD, LDC2 and GDC, and most important, the resulting code performs much better as when build into phobos. For whatever reason. There is a small demo programm included, which outputs the benchmark results.

May 25 '22 13:05 cschlote