milli icon indicating copy to clipboard operation
milli copied to clipboard

Store fuzzy/bucketed positions in `word_position_docids` database

Open loiclec opened this issue 2 years ago • 1 comments

Pull Request

Related issue

Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3222

Implementation

The design is described well in the related issue. For details of how different relative positions are grouped together, see the test bucketed_position.

Basically, we no longer store the exact position of words that appear far into an attribute, but instead group relative positions together in buckets whose size grows exponentially with the original position. This is done to improve the relevancy and the performance of the attribute ranking rule.


This is a draft until https://github.com/meilisearch/milli/pull/742 is merged and the results of the benchmarks are available.


EDIT: I also realised just now that the iterative version of the algorithm needs to be updated as well!

loiclec avatar Dec 15 '22 10:12 loiclec

I think I am going to postpone this improvement to v1.1 because:

  1. The iterative version of the algorithm also needs to be updated
  2. I found an unrelated bug in the implementation of the set-based version of the algorithm, and I would like to debug it first.
  3. The whole design of the attribute ranking rule will change a lot soon, and so will the whole structure of almost all search algorithms, so I don't want to duplicate the work too much

loiclec avatar Jan 02 '23 12:01 loiclec