murmur3
murmur3 copied to clipboard
How about the performance compare to c/c++?
Well, it's difficult to say without a proper equivalent C++ benchmark, but I would say not so bad.
- for "large" inputs (i.e > 1kB), the 128/64 bits version is more or less limited by the memory bandwidth.
- for small inputs the
h1 = h1*5 + 0xe6546b64statement suffers from theIMUL 5, ADD 0xe6546b64translation instead of a more optimizedLEAthat gcc/llvm would output (but it certainly doesn't make a 2 times slower). - very small inputs (< 8B) must of course show a few extra bound check around the tail/block split.
All in all, except possibly for very small inputs where any preparation costs show-up easily, I suspect the performance to be "fine enough" for most the bottleneck to be elsewhere. I would happily investigate, though, if you have any numbers (or situations) you find discouraging.
I have done a test of this library, reusee/mmh3, a modified reusee/mmh3 to use unsafe tricks, and the python mmh3. I've only done 128bit as that's all I'm interested in. On a corpus of 1m strings with an approx average length of ~300b:
| lib | timing |
|---|---|
| python-mmh3 | 240ms (hash_bytes: 290ms) |
| spaolacci/murmur3 | 376ms |
| reusee/mmh3 | 3842ms (3.8s) |
| mmh3/custom | 365ms |
I'd imagine that for C/C++ without the Python overhead, you might be able to cut that python bench by at least 25%. This library is the only Go one that uses the golang hash interface and supports streaming, so I'd say use this; there's not a ton of overhead left to remove.
I just wanted to say that I looked at all the murmur3 implementations in Go and IMHO this one is the best. As mentioned in the ReadMe and above it supports the standard Go hash interface, performance is excellent, and it is a great example of how to implement the 3 versions in a very go-centric way. It also has a BSD license, which was another plus for me.
Good news. Rsc start working on it https://github.com/golang/go/issues/8037