Updates to CalculateAverage_spullara
Making another PR to push progress in the open. Inlined a couple of methods so far. Attempts at SIMD acceleration are still failing. Kind of wondering if maybe it is already doing it in the VM for Array.equals for sure.
Instead of re-opening, could you please create a new PR, based on current upstream/main? Thx!
Message ID: @.***>
I have SIMD changes in https://github.com/spullara/1brc/tree/simd but they are much slower.
Let me know when you're ready for another run. With 21.0.1-graalce again, I suppose?
I don't think enough has changed honestly and I'm not even sure the inlining helps that much. I'll also update to "sdk use" in my script.
Should be about 10% faster than before.
I've been running it with 21.0.1-graal from sdkman, added it to the script.
Ready for testing. 1780ms on my machine :)
Hi @spullara, I was able to make the original version perform 25% better on my machine with a few changes. Feel free to use the ideas: https://github.com/gunnarmorling/1brc/pull/86
Hey @spullara, could you rebase this one to latest main and then run this:
./test.sh spullara
This is a simple test suite I've added to make sure implementations are compliant and there's one failure (I have clarified some value ranges in the course of the day, this may be related). And once it's good, remove the "test failure" label again. Thanks!
Updated and passes. BTW, having test.sh delete my 1B row file that I now need to regenerate is quite the troll :)
BTW, having test.sh delete my 1B row file that I now need to regenerate is quite the troll :)
Hum, hum, interesting point. I'd consider it volatile data, so 🤷 . The root issue is that we should have made the data set name an argument to the programs under test. Next time.
That said, something seems wrong with this PR now, way too many unrelated changes. Can you rebase/squash so that it only contains your changes over the latest main? Thx!
Hum, hum, interesting point. I'd consider it volatile data, so 🤷 . The root issue is that we should have made the data set name an argument to the programs under test. Next time.
Yeah, mine also takes a file as an argument so I could test locally with a smaller file when it was slower.
Nice, looking good now. 00:12.063, super-clsoe #2 behind @filiphr with 00:12.027!
Nice, looking good now. 00:12.063, super-clsoe #2 behind @filiphr with 00:12.027!
His is using the hash code of the city as a key which is against the rules? Otherwise mine would be even faster.
https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_filiphr.java#L173
Argh, again?! Seems I really need to add a test with 10K distinct keys. It's so easy to miss otherwise. Gnarf.
Argh, again?! Seems I really need to add a test with 10K distinct keys. It's so easy to miss otherwise. Gnarf.
You would have to construct a test with names that purposefully have the same hash code to discover these bugs with a test. You have to do it by inspection.
Actually, I'm not sure. Seems the key is derived straight from the characters and it is used as a key in a map, not index in an array. I.e. I don't think there could be collisions?
If they aren't comparing the actual string value of the keys you can construct keys that collide with their formula. You allow city names up to 100 bytes of UTF-8 and this compresses that to just 4 bytes. It is impossible to construct a 4 byte (int) hash that represents 100 bytes of UTF-8 uniquely.
True. It would just overflow at some point. Gnarf, that stuff is really tough to spot, as I can't generically fabricate colliding values without knowing what's the hash function. @filiphr, I'll remove your entry from the leaderboard for now, until this issue has been fixed. @spullara, thanks for paying attention that closely!
You should see how fast mine is if I use the same "optimization" :) The single largest chunk of time is spent comparing keys to ensure that they match.
I was reading the discussions around the hashing today. Sorry about it, I didn't have time to adapt it today.
I have an idea similar to also use Arrays.equals for this, let's see what that would lead to.