meilisearch
meilisearch copied to clipboard
Delete documents by query
Related product discussion: https://github.com/meilisearch/product/discussions/284 Related specification: WIP
TODO
- [ ] For implementation, check with the product squad to get how it should be implemented
- [ ] Merge the changes into main
- [ ] Update the specification
Impacted teams
@meilisearch/docs-team @meilisearch/integration-team
For people following this issue: we probably will not be able to make it for v1.1. In this case, this improvement will be implemented for v1.2 😇
Officially not ready for v1.1 (as expected) to focus on other priorities 😊 Reported to v1.2 💪
Opened #3550 as an initial exploration of the feature.
This surfaces a few product questions.
- Should we reuse the existing
/indexes/{:indexUid}/documents/delete-batchroute, or use a new one (e.g.../delete-filter)? - Similarly, should we reuse the existing
documentDeletiontask type, or introduce a newdocumentDeletionByFiltertask type? - What should be the "details" of the task when performing a delete by filter?
- If reusing the
documentDeletiontask, what should we do of theprovidedIdsfield? Should it be set tonull, absent altogether, ...? - Should the filter be applied on the documents as they are in the index when the query is received ("query time"), or as they are once all the previously sent updates have been asynchronously applied by the scheduler ("processing time")? Whichever solution we choose, I think the documentation should talk about this timing issue and discourage the use of "flaky" filters that are too dependent on an update being applied or not to the document (for instance, filters on fields that are often updated).
@dureuill I'll speak only about 5. I think that it should be computed at processing time to respect the task processing order, and, as explained in a private message:
- a document modification is in fact a deletion of the old version, then, the addition of the new version. That means that if the list of documents to delete is computed at query time, then, all the documents that will be modified before the processing of the deletion will not be deleted. However, this case appears only if you list the internal document ids without fetching the external document ids at query time.
- what do we do with the document additions between the query time list computation and the processing of the task deletion? Should we keep them despite the fact that if the indexing were faster they would be deleted?
An additional question I have in mind about the scenario where the list is computed at processing time is: what would be the behavior if I remove the filter before the deletion task is processed?
Thanks for your input Many!
However, this case appears only if you list the internal document ids without fetching the external document ids at "query time".
Yes, for this reason, if deleting at "query time", we will always use external document ids.
what do we do with the document additions between the query time list computation and the processing of the task deletion?
If computing the filter at query time, then adding documents that would match the filter after the query would result in them not being deleted. You're correct that it then depends on whether the associated update task completed or not at request time.
what would be the behavior [when filtering at processing time] if I remove the filter before the deletion task is processed?
I don't understand what you mean by "removing the filter". Making fields used in the filter non-filterable? If so the task will fail at processing time.
Another argument against doing it at query time (when inserted in the queue) is about future implementation of a replication system. We must follow the processing order of the queue. The deleted documents will depend on the moment and the machine that will receive the task, which is non-determinist.
About using another task type, I would rather keep the current documentDeletion tasks, at least use this one to filter and describe tasks in the /tasks route. I am pretty sure we will need to break the dumps by adding a new deletion task that stores a filter. What do you think @irevoire?
Yes, but it's non-breaking since we'll still be able to parse the old dump type, right? (we won't be able to import a dump made in v1.1 in the v1.0, though)
Everything has been implemented, closing
Hello everyone following this issue 👋
We have just released the first RC (release candidate) of Meilisearch containing this new implementation! You can test this feature by using:
- the release assets
- or the Meilisearch Docker image
docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:v1.2.0-rc.0
You are more than welcome to communicate any feedback about this new implementation in this discussion. If you encountered any bugs, please report them here.
Thanks in advance for your help and your involvement in Meilisearch ❤️
🎉 Official and stable release containing this change will be available on 5th June 2023
⚠️ RC (release candidates) are not recommended for production