1brc First Version

First Version

Open SamuelYvon opened this issue 1 year ago • 0 comments

My first submission. Using memory-mapping to split the file parsing in chunks, the rest is pretty standard stuff. Using manual parsing of lines to avoid building huge UTF-8 strings.

I have a few ideas where I can save a bit more time, especially for avoiding the String parsing / unparsing & Hashing. I think I can plug the vector API to parse the digits faster but I'm not sure if it's really going to be worth it.

Check List:

[X] Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
[X] All formatting changes by the build are committed
[X] Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
[X] Output matches that of calculate_average_baseline.sh

Execution time: ~6s
Execution time of reference implementation: ~2m 10

Note that I have a test that fails (with UTF-8 chars) but when I tested on other implementations it also failed for all of them, so I'm assuming it was an error in the provided code, I was a dozen commits behind. Comparing against the baseline yields correct results.

Jan 10 '24 16:01 SamuelYvon

1brc 1brc copied to clipboard

First Version

Check List:

1brc
1brc copied to clipboard