BitFunnel icon indicating copy to clipboard operation
BitFunnel copied to clipboard

What has to change to support higher (e.g., rank 13) rows?

Open danluu opened this issue 7 years ago • 1 comments

  1. Dedup buffer has to get wider. For example, it would have to go from 2**6 to 2**13. This seems ok because we only clear the entirety when we start up BitFunnel and otherwise we only clear matching bits.
  2. In some RowId related data structure, we'll need an extra bit or two. That structure has spare bits that are unused for whether a ?? is Adhoc, explicit, or a fact, so we can steal at least two bits.
  3. There's some place where the maxRank was 7 in the old BitFunnel code because ????, and then it was changed to 6 because we didn't 'use 7. But that would have to increase.

Note that we designed the row tables such that the first quadword in a row is the first qword in a cacheline. If we have a very high rank, we blow that out to how many qwords are in a low rank... this would force a high.

Note that we also cannot currently handle rows that are less than one qword wide (when we rankdown we'll look at the corresponding non-existent lower rank rows). This is probably a TODO for the future as an optimization.

danluu avatar Nov 24 '16 01:11 danluu

I think this means that, until we fix the issue where we can't handle rows that aren't as wide as a qword, we have to pass either a "max rank" or "minimum documents per shard" parameter to the TermTableBuilder, since the builder shouldn't necessarily know how many documents exist (e.g., in the case where we run the configuration off of N documents and then ingest 100*N documents).

danluu avatar Nov 24 '16 06:11 danluu