milli icon indicating copy to clipboard operation
milli copied to clipboard

Simplify the way we store the settings in an index

Open Kerollmops opened this issue 3 years ago • 5 comments
trafficstars

The current version of Meilisearch stores each setting independently in a key/value entry in the main database, to change and access this setting we need to add no more than 3 methods in the Index struct, it is too much and must be improved.

https://github.com/meilisearch/milli/blob/9580b9de7973e5252d23a2951c3fd3f973ca9617/milli/src/index.rs#L797-L814

A good way to improve that would be to create a new, well-typed, struct that describes the settings and store it in the database by using the SerdeJson de/serializer of heed.

Kerollmops avatar Jun 08 '22 16:06 Kerollmops

Looking into this

phdavis1027 avatar Jul 20 '22 17:07 phdavis1027

I just spent some time doing an initial survey of all the things that this new Settings struct should keep track of. I referenced this page but obviously there are some other things we'll need that aren't list there. Upon further examination, it seems to me that all the things listed in pub mod main_key (index.rs:30) are officially defined as "settings" and all the things that aren't in there are not. Is this a fair assessment?

Moreover, I'm trying to picture the new interface to index settings. What is the intended benefit of this change? Is it just to further separate concerns? If so, it would make sense for pub struct Settings to just have three methods corresponding to the normal put, get, and delete operations, but I'd be interested in knowing what other goals might be achieved that might affect the design.

Finally, I noticed that the technical overview stressed that it was important for this change to be backward-compatible, but that further research was necessary. Has anything interesting been discovered?

Sorry for the wall of text. I just realized I had a lot of thoughts.

phdavis1027 avatar Jul 21 '22 03:07 phdavis1027

Hello @phdavis1027

Thanks for your involvement, don't worry about writing a lot; it's always interesting for us to discuss with the community! 😄

FYI, @Kerollmops is on Holiday right now, he's coming back in a few weeks, so you will probably get an answer when he comes back

curquiza avatar Jul 27 '22 07:07 curquiza

Gotcha, thanks for the reply! I'll put a pause on this until then, for fear of finding out any changes I make are irrelevant.

phdavis1027 avatar Jul 27 '22 17:07 phdavis1027

Hey @phdavis1027,

Sorry for the late answer! I looked at the codebase and listed the keys that are real settings (at the end of my message).

What I see by using a single Settings struct that you directly store into one single entry in the main database is a simplification of the internal API and impl block of the Index struct, one single put, delete and get for the whole list of Settings.

The Settings should always be merged with the previous version of them, and therefore every single setting field must be optional (as optionally set). We could probably use the Setting struct?

In terms of backward compatibility, you can ignore that for now, we will figure that out later.

Thank you very much for the time you take on this subject as it is a quite hard problem 😃

CRITERIA_KEY
DISPLAYED_FIELDS_KEY
DISTINCT_FIELD_KEY
HIDDEN_FACETED_FIELDS_KEY
FILTERABLE_FIELDS_KEY
SORTABLE_FIELDS_KEY
GEO_FACETED_DOCUMENTS_IDS_KEY
PRIMARY_KEY_KEY
SEARCHABLE_FIELDS_KEY
USER_DEFINED_SEARCHABLE_FIELDS_KEY
STOP_WORDS_KEY
SYNONYMS_KEY
AUTHORIZE_TYPOS
ONE_TYPO_WORD_LEN
TWO_TYPOS_WORD_LEN
EXACT_WORDS
EXACT_ATTRIBUTES
MAX_VALUES_PER_FACET
PAGINATION_MAX_TOTAL_HITS

Kerollmops avatar Aug 22 '22 13:08 Kerollmops