Sebastiano Vigna

Results 188 comments of Sebastiano Vigna

That was 10, I'll try 11. On November 18, 2023 10:13:21 AM GMT+01:00, Ragnar Groot Koerkamp ***@***.***> wrote: >Yeah so as the error says, the algorithm only works well if...

I tried with c=11 and a single shard but I got ``` keys: 1000000000000 shards: 1 parts: 3892549 slots/prt: 262144 slots tot: 1020408365056 buckets/prt: 72285 buckets tot: 281372904465 keys/bucket: 3.55...

With c=11 the first round failed. I'll make it do another couple of rounds. I'm using 1000 shards.

It didn't stop, but it is still at 800/1000 chunks after three days. Hardly usable in practice. If it finishes I'll try with a smaller alpha.

Oh no. Well, from my experience, it is better that default options offer a reasonable behavior. People won't always come to you for explanations—they'll just move elsewhere.

On a better note: it completed! But it didn't print any stats about space, unless " keys/bucket: 3.26" it is. That's where serialization is really useful—you can measure the file...

Will it obey to TMPDIR? I can't find where it's creating the shards...

Well, you might just log what you're doing, since you're already logging anyway. Something like "100 shards, keys will be re-read 100 times (switch to offline sharding for single-pass)". At...

> Yeah well... People using this for up to 10^9 elements probably don't want sharding to disk and this program writing disk unexpectedly. Maybe I'll make a separate function build_on_disk...

> Hmm very interesting point. So it sounds like going out-of-memory if a preferred default behaviour over both sharding with rereading and sharding to disk. I'll probably just make 3...