1brc
1brc copied to clipboard
First Version
My first submission. Using memory-mapping to split the file parsing in chunks, the rest is pretty standard stuff. Using manual parsing of lines to avoid building huge UTF-8 strings.
I have a few ideas where I can save a bit more time, especially for avoiding the String parsing / unparsing & Hashing. I think I can plug the vector API to parse the digits faster but I'm not sure if it's really going to be worth it.
Check List:
- [X] Tests pass (
./test.sh <username>
shows no differences between expected and actual outputs) - [X] All formatting changes by the build are committed
- [X] Your launch script is named
calculate_average_<username>.sh
(make sure to match casing of your GH user name) and is executable - [X] Output matches that of
calculate_average_baseline.sh
- Execution time: ~6s
- Execution time of reference implementation: ~2m 10
Note that I have a test that fails (with UTF-8 chars) but when I tested on other implementations it also failed for all of them, so I'm assuming it was an error in the provided code, I was a dozen commits behind. Comparing against the baseline yields correct results.