1brc icon indicating copy to clipboard operation
1brc copied to clipboard

Make generation of measurements file deterministic

Open datdenkikniet opened this issue 1 year ago • 6 comments

Addresses #35.

~~This is currently not implemented/added for CreateMeasurements2 (@rschwietzke).~~

~~If it's desirable that both produce the same output, LMK and I can add that.~~

Makes both CreateMeasurement implementations generate the same output, and makes it deterministic based on a provided seed (with a default value).

datdenkikniet avatar Jan 05 '24 18:01 datdenkikniet

Feel free to undo the use of FastRandom. nextGaussian is a lot of code, hence I prefered to take that all out and also the double to char/byte conversion. There are ways to make that all faster, It is for instance a single thread and also we roll char to byte again and again.

rschwietzke avatar Jan 05 '24 18:01 rschwietzke

FastRandom was neatly quick, and just using it everywhere was quite easy (and provided a bit of speed up to version 1 as well). Since both implementations produce the same output now (except for the fact that version 2 is keen to output a few more lines, but :shrug: ), I vote that we get rid of the slower one.

datdenkikniet avatar Jan 05 '24 19:01 datdenkikniet

Updated with the new random station generation.

Now we have 3 measurement creators that output the exact same file :P

image

I have also made some slight syntactic changes to the name generation @mtopolnik. Semantically it should do the same thing, but would appreciate if you could take a look either way.

datdenkikniet avatar Jan 06 '24 12:01 datdenkikniet

I bit the bullet and rented out a CCX3 :D Very surprising results using the full 1B dataset on this machine:

merykitty - 51 seconds thomaswue - 21 seconds ebarlas, royvanrijn break on the 31-bit mapped region size length obourgain: 5 seconds!

mtopolnik avatar Jan 07 '24 21:01 mtopolnik

Correction: didn't check the command output... obourgain takes 5 seconds to break with OOM :D

mtopolnik avatar Jan 07 '24 21:01 mtopolnik

Update: royvanrijn's solution now works.

@gunnarmorling I'm wondering if you're having second thoughts on merging this, given the difference in the timings and the large number of contestants. OTOH it's kind of beside the point to reward solutions that work great for 416 short keys but degrade significantly for 10,000 keys using the full 100 byte range.

mtopolnik avatar Jan 08 '24 15:01 mtopolnik