1brc
1brc copied to clipboard
Make generation of measurements file deterministic
Addresses #35.
~~This is currently not implemented/added for CreateMeasurements2 (@rschwietzke).~~
~~If it's desirable that both produce the same output, LMK and I can add that.~~
Makes both CreateMeasurement implementations generate the same output, and makes it deterministic based on a provided seed (with a default value).
Feel free to undo the use of FastRandom. nextGaussian is a lot of code, hence I prefered to take that all out and also the double to char/byte conversion. There are ways to make that all faster, It is for instance a single thread and also we roll char to byte again and again.
FastRandom
was neatly quick, and just using it everywhere was quite easy (and provided a bit of speed up to version 1 as well). Since both implementations produce the same output now (except for the fact that version 2 is keen to output a few more lines, but :shrug: ), I vote that we get rid of the slower one.
Updated with the new random station generation.
Now we have 3 measurement creators that output the exact same file :P
I have also made some slight syntactic changes to the name generation @mtopolnik. Semantically it should do the same thing, but would appreciate if you could take a look either way.
I bit the bullet and rented out a CCX3 :D Very surprising results using the full 1B dataset on this machine:
merykitty - 51 seconds thomaswue - 21 seconds ebarlas, royvanrijn break on the 31-bit mapped region size length obourgain: 5 seconds!
Correction: didn't check the command output... obourgain takes 5 seconds to break with OOM :D
Update: royvanrijn's solution now works.
@gunnarmorling I'm wondering if you're having second thoughts on merging this, given the difference in the timings and the large number of contestants. OTOH it's kind of beside the point to reward solutions that work great for 416 short keys but degrade significantly for 10,000 keys using the full 100 byte range.