Ragnar Groot Koerkamp

Results 148 comments of Ragnar Groot Koerkamp

I just updated from 0.12 to 0.13, and the `mean` in the `boxplot` below is now computed in the transformed domain, whereas before it was the linear domain. ```py import...

Hey, Perfect hashes don't really make sense to be used as hashers for a standard hashmap. Unlike normal hashes, constructing a perfect hash function requires knowing up front which keys...

Yes, that looks like how I'm using it. Just note that you could get away with not storing the `name` of the city in the `AggregateData` struct.

Yeah so as the error says, the algorithm only works well if there are sufficiently many buckets and slots. You probably want to run with `c=11` to have more buckets...

It will store `n/s` hashes per shard. Do you have `10^12 / 1 * 16byte = 16TB` of RAM? :sweat_smile: Probably you want ~1000 shards to get ~16GB memory per...

Or maybe try alpha=0.97 or a bit lower. Its hard to know exactly where is the bottleneck.

Hmm. That's weird. Did you enable sharding to disk? Without it it will hash all keys once for each shard which is quite a bit slower probably.

Yeah well... People using this for up to 10^9 elements probably don't want sharding to disk and this program writing disk unexpectedly. Maybe I'll make a separate function build_on_disk for...