1brc icon indicating copy to clipboard operation
1brc copied to clipboard

Algorithmic Tweaks, parallel stream, memory mapped file

Open twobiers opened this issue 1 year ago • 5 comments

Thank you for the interesting challenge. I've set myself a limit to finish this evening to not invest so much time and this is what I came up with. Basically just algorithmic improvements on hot Code paths and utilizing a parallel stream.

Current results on my machine (AMD Ryzen 7 PRO 4750G 16 core, 48GB RAM) - Latest Temurin JDK:

# Result (m:s:ms) Implementation
1. 00:26.06 CalculateAverage_twobiers.java
2. 02:57.77 CalculateAverage.java (baseline)

I'm curious what others find.

I thought about caching some parts as it is most likely static data. However, I think this would not be in the spirit of the challenge.

twobiers avatar Dec 30 '23 00:12 twobiers

Hey, wow, that's awesome, thanks a lot for this submission! I'll merge and evaluate it once I've officially launched and announced this challenge (planned for tomorrow).

gunnarmorling avatar Dec 30 '23 18:12 gunnarmorling

Oh, I saw it in my GitHub Feed and thought it is already open for submissions. Actually I misread the deadline date and assumed the challenge would end today. I'm sorry for the inconvenience.

In that case I will convert the PR to a draft and take a look in January again to find further optimizations.

twobiers avatar Dec 31 '23 11:12 twobiers

LOL, no worries, it's not a convinience whatsoever. To the contrary, it's very encouraging :)

In that case I will convert the PR to a draft and take a look in January again to find further optimizations.

+1. You'll have time until Jan 31. Note I'll do one more tweak and that is to also ask to emit min and max value per station. This is to avoid somebody cheats by only processing a part of the dataset (which should be obvious from looking at the code, but it might also be easy to miss).

gunnarmorling avatar Dec 31 '23 16:12 gunnarmorling

I think I'm done for now, lacking more ideas. Might take a look again in 1-2 weeks

twobiers avatar Jan 02 '24 20:01 twobiers

Shamelessly sharing this idea for JVM/GC tuning in another PR/discussion? https://github.com/gunnarmorling/1brc/pull/15#issuecomment-1875495420

lobaorn avatar Jan 03 '24 15:01 lobaorn

Could you please run test.sh twobiers and make sure all the tests pass? Thanks!

gunnarmorling avatar Jan 05 '24 09:01 gunnarmorling

Issue seems to be that you configure Shenandoah GC. Which JDK distro should this be run on?

gunnarmorling avatar Jan 05 '24 10:01 gunnarmorling

I used the latest Temurin distribution that is available in sdkman.

twobiers avatar Jan 05 '24 10:01 twobiers

Still seeing test failures. Can you also please adjust your launch script to set the right JDK. See @royvanrijn's one as an example. Thanks.

gunnarmorling avatar Jan 05 '24 16:01 gunnarmorling

@gunnarmorling never change a running system... Tests should pass now.

twobiers avatar Jan 05 '24 18:01 twobiers

51.678sec. Thx for being the first participant to this one!

gunnarmorling avatar Jan 05 '24 19:01 gunnarmorling