milli icon indicating copy to clipboard operation
milli copied to clipboard

Separate the original facet string values from the docids

Open Kerollmops opened this issue 4 years ago • 2 comments
trafficstars

I found out that storing the original facet strings in front of the documents ids associated with a normalized facet string in the facet_id_string_docids database wasn't a very good idea when it comes to deleting the entries.

As you can see it would be way better to just separate the original facet string into another database:

https://github.com/meilisearch/milli/blob/c51bb6789cb3fbb6511138374b3443f9116a445c/milli/src/update/delete_documents.rs#L457-L516

The amount of code to delete the documents ids from this database would be much lower. In addition to a little portion of code to delete the original values in the separate database too!

https://github.com/meilisearch/milli/blob/c51bb6789cb3fbb6511138374b3443f9116a445c/milli/src/update/delete_documents.rs#L429-L455

Kerollmops avatar Aug 24 '21 17:08 Kerollmops

Hi! If this one isn't assigned to anyone yet, I'd be happy to help with implementing it.

shekhirin avatar Aug 30 '21 16:08 shekhirin

Hey @shekhirin,

After some thought about this problem of complexity, I am not quite sure about how it could work, as you can see the algorithm is quite complex and, even moving the original values in another LMDB database will only simplify the level-0 and not the level-1+ where a tuple of u32s is prefixing the documents ids (a RoaringBitmap).

In conclusion, I prefer keeping this issue open, just for me to continue thinking about it and maybe find a prettier solution. For now, we will keep this ugly and not pretty deletion algorithm.

Kerollmops avatar Aug 31 '21 12:08 Kerollmops