meilisearch Delete documents by query

Related product discussion: https://github.com/meilisearch/product/discussions/284 Related specification: WIP

TODO

[ ] For implementation, check with the product squad to get how it should be implemented
[ ] Merge the changes into main
[ ] Update the specification

Impacted teams

@meilisearch/docs-team @meilisearch/integration-team

Feb 09 '23 13:02 dureuill

For people following this issue: we probably will not be able to make it for v1.1. In this case, this improvement will be implemented for v1.2 😇

Feb 22 '23 12:02 curquiza

Officially not ready for v1.1 (as expected) to focus on other priorities 😊 Reported to v1.2 💪

Mar 02 '23 17:03 curquiza

Opened #3550 as an initial exploration of the feature.

This surfaces a few product questions.

Should we reuse the existing /indexes/{:indexUid}/documents/delete-batch route, or use a new one (e.g. ../delete-filter)?
Similarly, should we reuse the existing documentDeletion task type, or introduce a new documentDeletionByFilter task type?
What should be the "details" of the task when performing a delete by filter?
If reusing the documentDeletion task, what should we do of the providedIds field? Should it be set to null, absent altogether, ...?
Should the filter be applied on the documents as they are in the index when the query is received ("query time"), or as they are once all the previously sent updates have been asynchronously applied by the scheduler ("processing time")? Whichever solution we choose, I think the documentation should talk about this timing issue and discourage the use of "flaky" filters that are too dependent on an update being applied or not to the document (for instance, filters on fields that are often updated).

Mar 06 '23 12:03 dureuill

@dureuill I'll speak only about 5. I think that it should be computed at processing time to respect the task processing order, and, as explained in a private message:

a document modification is in fact a deletion of the old version, then, the addition of the new version. That means that if the list of documents to delete is computed at query time, then, all the documents that will be modified before the processing of the deletion will not be deleted. However, this case appears only if you list the internal document ids without fetching the external document ids at query time.
what do we do with the document additions between the query time list computation and the processing of the task deletion? Should we keep them despite the fact that if the indexing were faster they would be deleted?

An additional question I have in mind about the scenario where the list is computed at processing time is: what would be the behavior if I remove the filter before the deletion task is processed?

Mar 06 '23 13:03 ManyTheFish

Thanks for your input Many!

However, this case appears only if you list the internal document ids without fetching the external document ids at "query time".

Yes, for this reason, if deleting at "query time", we will always use external document ids.

what do we do with the document additions between the query time list computation and the processing of the task deletion?

If computing the filter at query time, then adding documents that would match the filter after the query would result in them not being deleted. You're correct that it then depends on whether the associated update task completed or not at request time.

what would be the behavior [when filtering at processing time] if I remove the filter before the deletion task is processed?

I don't understand what you mean by "removing the filter". Making fields used in the filter non-filterable? If so the task will fail at processing time.

Mar 06 '23 13:03 dureuill

Another argument against doing it at query time (when inserted in the queue) is about future implementation of a replication system. We must follow the processing order of the queue. The deleted documents will depend on the moment and the machine that will receive the task, which is non-determinist.

About using another task type, I would rather keep the current documentDeletion tasks, at least use this one to filter and describe tasks in the /tasks route. I am pretty sure we will need to break the dumps by adding a new deletion task that stores a filter. What do you think @irevoire?

Mar 06 '23 14:03 Kerollmops

Yes, but it's non-breaking since we'll still be able to parse the old dump type, right? (we won't be able to import a dump made in v1.1 in the v1.0, though)

Mar 06 '23 15:03 irevoire

Everything has been implemented, closing

May 04 '23 14:05 irevoire

Hello everyone following this issue 👋

We have just released the first RC (release candidate) of Meilisearch containing this new implementation! You can test this feature by using:

the release assets
or the Meilisearch Docker image

docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:v1.2.0-rc.0

You are more than welcome to communicate any feedback about this new implementation in this discussion. If you encountered any bugs, please report them here.

Thanks in advance for your help and your involvement in Meilisearch ❤️

🎉 Official and stable release containing this change will be available on 5th June 2023

⚠️ RC (release candidates) are not recommended for production

May 10 '23 12:05 curquiza

meilisearch meilisearch copied to clipboard

Delete documents by query

TODO

Impacted teams

meilisearch
meilisearch copied to clipboard