PTRHash icon indicating copy to clipboard operation
PTRHash copied to clipboard

PTRHash fails to build on 10^12 keys

Open vigna opened this issue 1 year ago • 19 comments

This is with the parameters in many_keys.rs, just changing the number of keys:

[...]
part    507 bucket  0.01%  chain:   8388608
Too many displacements. Aborting!
Possible causes:
- Too many elements in part.
- Not enough empty slots => lower alpha.
- Not enough buckets     => increase c.
- Not enough entropy     => fix algorithm.

part   1350 bucket  0.03%  chain:   8388608
Too many displacements. Aborting!
Possible causes:
- Too many elements in part.
- Not enough empty slots => lower alpha.
- Not enough buckets     => increase c.
- Not enough entropy     => fix algorithm.

thread 'main' panicked at /home/vigna/git/PTRHash/src/lib.rs:369:13:
Failed to find a global seed after 3 tries for 262144 keys.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I can send the whole log if that's useful.

vigna avatar Nov 18 '23 07:11 vigna

BTW, somewhat unrelated, but this comment

    // Since running all queries is super slow, we only check a subset of them.
    // Although this doesn't completely check that there are no duplicate
    // mappings, by the birthday paradox we can be quite sure there are none
    // since we check way more than sqrt(n) of them.

has no mathematical basis. It would be correct if the output of the function was random, but it is not. If there are exactly two keys colliding the probability of hitting them is the probability of having exactly those two keys in the subset you test, and this has nothing to do with the birthday paradox.

vigna avatar Nov 18 '23 07:11 vigna

Yeah so as the error says, the algorithm only works well if there are sufficiently many buckets and slots. You probably want to run with c=11 to have more buckets and significantly increase the probability that displacing finds a solution.

I still have to work out the math at some point. Currently the number of buckets scales as c * n / lg(n) (as does PTHash), but in practice it seems that the default c=9 fails for larger n, so it's needed to compensate manually for the /lg(n) by increasing c.

Ah yeah I did realize at some point that the comment doesn't really make sense indeed. will fix it at some point.

It would be cool to be able to efficiently check all 10^12 keys using multiple threads. Maybe each thread could store a buffer of 10^6 to 10^9 keys, and then flush/write that once full. But still there is so much writing going on it's bound to be slow I suspect.

RagnarGrootKoerkamp avatar Nov 18 '23 09:11 RagnarGrootKoerkamp

That was 10, I'll try 11.

On November 18, 2023 10:13:21 AM GMT+01:00, Ragnar Groot Koerkamp @.***> wrote:

Yeah so as the error says, the algorithm only works well if there are sufficiently many buckets and slots. You probably want to run with c=10 or c=11 to have more buckets and significantly increase the probability that displacing finds a solution.

I still have to work out the math at some point. Currently the number of buckets scales as c * n / lg(n) (as does PTHash), but in practice it seems that the default c=9 fails for larger n, so it's needed to compensate manually for the /lg(n) by increasing c.

-- Reply to this email directly or view it on GitHub: https://github.com/RagnarGrootKoerkamp/PTRHash/issues/6#issuecomment-1817452941 You are receiving this because you authored the thread.

Message ID: @.***>

vigna avatar Nov 18 '23 09:11 vigna

Did this end up working?

RagnarGrootKoerkamp avatar Nov 21 '23 10:11 RagnarGrootKoerkamp

I tried with c=11 and a single shard but I got

        keys: 1000000000000
      shards:          1
       parts:    3892549
   slots/prt:     262144
   slots tot: 1020408365056
 buckets/prt:      72285
 buckets tot: 281372904465
 keys/bucket:          3.55
memory allocation of 2147483648 bytes failed

Now trying with more shards.

vigna avatar Nov 21 '23 10:11 vigna

It will store n/s hashes per shard. Do you have 10^12 / 1 * 16byte = 16TB of RAM? :sweat_smile:

Probably you want ~1000 shards to get ~16GB memory per shard.

RagnarGrootKoerkamp avatar Nov 21 '23 11:11 RagnarGrootKoerkamp

With c=11 the first round failed. I'll make it do another couple of rounds. I'm using 1000 shards.

vigna avatar Nov 22 '23 13:11 vigna

Then c=12 maybe?

RagnarGrootKoerkamp avatar Nov 22 '23 13:11 RagnarGrootKoerkamp

Or maybe try alpha=0.97 or a bit lower. Its hard to know exactly where is the bottleneck.

RagnarGrootKoerkamp avatar Nov 22 '23 18:11 RagnarGrootKoerkamp

It didn't stop, but it is still at 800/1000 chunks after three days. Hardly usable in practice. If it finishes I'll try with a smaller alpha.

vigna avatar Nov 25 '23 07:11 vigna

Hmm. That's weird. Did you enable sharding to disk? Without it it will hash all keys once for each shard which is quite a bit slower probably.

RagnarGrootKoerkamp avatar Nov 26 '23 06:11 RagnarGrootKoerkamp

Oh no. Well, from my experience, it is better that default options offer a reasonable behavior. People won't always come to you for explanations—they'll just move elsewhere.

vigna avatar Nov 26 '23 08:11 vigna

On a better note: it completed! But it didn't print any stats about space, unless " keys/bucket: 3.26" it is. That's where serialization is really useful—you can measure the file to double check.

vigna avatar Nov 26 '23 08:11 vigna

Will it obey to TMPDIR? I can't find where it's creating the shards...

vigna avatar Nov 26 '23 08:11 vigna

Yeah well... People using this for up to 10^9 elements probably don't want sharding to disk and this program writing disk unexpectedly. Maybe I'll make a separate function build_on_disk for visibility and clarity. TMPDIR env var should work yes.

It should normally print a bits/element stat at the end, but at some point I broke that and it was printing 0 instead. I think that's fixed now.

RagnarGrootKoerkamp avatar Nov 26 '23 08:11 RagnarGrootKoerkamp

Well, you might just log what you're doing, since you're already logging anyway. Something like "100 shards, keys will be re-read 100 times (switch to offline sharding for single-pass)". At least people will know what's happening and they can take action.

I agree on the "unexpectedly on disk" viewpoint—VFunc does the same. But in the VFunc case if things do not go well you'll have an out-of-memory error, and people will look for offline options because that's what happens in 90% of these structures. The problem with PTRHash is that you do not get out-of-memory—you get slow. People does not take this behavior as a suggestion to change options, since it somehow works.

I modified the example to serialize (it'll be fun). I'll keep you posted. Probably with memory-mapping I can even try to validate the structure.

vigna avatar Nov 26 '23 08:11 vigna

Yeah well... People using this for up to 10^9 elements probably don't want sharding to disk and this program writing disk unexpectedly. Maybe I'll make a separate function build_on_disk for visibility and clarity. TMPDIR env var should work yes.

It does!

TMP PATH: "/mnt/extra/analysis/vigna/.tmpnLSHXt"

vigna avatar Nov 26 '23 08:11 vigna

Well, you might just log what you're doing, since you're already logging anyway. Something like "100 shards, keys will be re-read 100 times (switch to offline sharding for single-pass)". At least people will know what's happening and they can take action.

Good idea. I will add some logs.

I agree on the "unexpectedly on disk" viewpoint—VFunc does the same. But in the VFunc case if things do not go well you'll have an out-of-memory error, and people will look for offline options because that's what happens in 90% of these structures. The problem with PTRHash is that you do not get out-of-memory—you get slow. People does not take this behavior as a suggestion to change options, since it somehow works.

Hmm very interesting point. So it sounds like going out-of-memory if a preferred default behaviour over both sharding with rereading and sharding to disk. I'll probably just make 3 distinct entry points for these options.

RagnarGrootKoerkamp avatar Nov 29 '23 15:11 RagnarGrootKoerkamp

Hmm very interesting point. So it sounds like going out-of-memory if a preferred default behaviour over both sharding with rereading and sharding to disk. I'll probably just make 3 distinct entry points for these options.

"Is" is a big word :). That's my personal take on user psychology. Catastrophic disasters lead to search for solutions. Bad performance leads to think the that's the performance of the software. Sometimes the user will think "maybe I can fiddle", but I think less often than with a catastrophic result.

vigna avatar Nov 29 '23 18:11 vigna