frozen icon indicating copy to clipboard operation
frozen copied to clipboard

Reduce memory usage of `make_pmh_tables`?

Open cbeck88 opened this issue 6 years ago • 3 comments

Hi,

So, it occurred to me that it's possible that the memory usage of the buckets array is a bottleneck in the make_pmh_tables function.

The reason I say that is:

  • The paper that was referred to in the blog post says that the randomized PMH algorithm, in the random hash function model, is supposed to take linear (or nearly linear?) time almost surely

  • Yet, in this line, we allocate about n^2 memory on the stack to hold the buckets, and zero-initialize all of it:

    // Step 1: Place all of the keys into buckets cvector<bucket<M>, M> buckets;

I actually asked a stackoverflow question about this (naively) in the process of trying to sort it out: https://stackoverflow.com/questions/47700071/time-complexity-of-allocating-uninitialized-memory-on-the-stack-in-a-constexpr-f

When I was (earlier) trying to figure out how to make this compile on gcc 5.4, one of the problems is that gcc 5 has a hard time mutating arrays at compile time, it tends to just get confused and give up on the constexpr. I was trying to figure out how to get the buckets without doing that, in like a pure-functional way. One approach that I thought of, instead of having M "dynamically-sized" mutable buckets is:

  • First compute a list of all M hash pairs: that is all N pairs of the form ([index in original array], [hash of that item % M])
  • Sort this list of pairs, using the second part (the hash) as sorting criteria. Now items that hash to the same bucket are adjacent.
  • Extract from the sorted list a list of M "ranges" corresponding to the buckets, range being like a pair of iterators. Iterate over the list of ranges in the later parts of algorithm.

I think that

  • This can be done at compile time (certainly on gcc 6+, I've given up on gcc 5 :) )
  • It would only use O(M) memory and O(M log M) time, instead of O(M) time but O(M^2) memory + initialization
  • It might speed up the step where we later sort the buckets by size, since we just sort the ranges then, and swapping means swapping two O(1)-sized things, rather than O(M) sized buckets.

Another idea would be, try to make the cvector leave its entries uninitialized when it is constructed. I'm not really sure how that works in constexpr and if there would actually be savings.

Do you think this sorting approach makes sense at a high level? I might try to implement it later. :)

cbeck88 avatar Dec 08 '17 02:12 cbeck88

Yeah, the O(n^2) is a n issue. it's only paid at compile time but still it should be avoided. It's tool late for me to think clearly about that, but maybe we don't need that much space when combined with your hash selection criteria: if a bucket is oversize, we could try another seed or something like that? Preallocate N buckets of k (k being a fixed constant) elements and cope with it?

serge-sans-paille avatar Dec 08 '17 23:12 serge-sans-paille

Yea we can do some measurements and see. That might be the simplest way to fix. Im only not sure what the max bucket size can be. The second phase can probably tolerate several medium size buckets just not TOO many. If the restriction is too tight we will time out in the do-while loop.

cbeck88 avatar Dec 09 '17 00:12 cbeck88