qdrant
qdrant copied to clipboard
Batch update API
Is your feature request related to a problem? Please describe.
Case 1: Allow to update multiple payloads of different vectors in one request. Case 2: Allow to overwrite single (but different for each point) payload value, for multiple points
Describe the solution you'd like
Create a batch update API, which can receive a list of different update requests:
- upsert
- delete
- set_payload
- overwrite_payload
- delete_payload
- clear_payload
For REST:
PUT /collections/<name>/points/batch
And similar for gRPC.
Describe alternatives you've considered Create batch request for individual type of APIs
Additional context Acceptance criteria:
- New api should have integration tests
- Should re-use existing data structures as much as possible
- API schema should be properly generated (there are scripts and CI for that)
- This changes should not affect any existing functionality, be 100% backward compatible
/bounty $300
~~💎 $300 bounty created by generall~~
~~🙋 If you start working on this, comment /attempt #1904 to notify everyone~~
~~👉 To claim this bounty, submit a pull request that includes the text /claim #1904 somewhere in its body~~
~~📝 Before proceeding, please make sure you can receive payouts in your country~~
~~💵 Payment arrives in your account 2-5 days after the bounty is rewarded~~
~~💯 You keep 100% of the bounty award~~
~~🙏 Thank you for contributing to qdrant/qdrant!~~
~~
| Attempt | ~~ ~~Started (GMT+0) | ~~ ~~Solution | ~~ ~~
|---|---|---|
| 🟢 @Jesse-Bakker | ~~ ~~~~ ~~ | #1951 | ~~ ~~
What is the expected behavior if the result sets of update requests overlap? For example, if two upserts include the same point, or have filters that apply to the same point?
What is the expected behavior if the result sets of update requests overlap? For example, if two upserts include the same point, or have filters that apply to the same point?
I think we can just apply updates one by one
Hey, I will like to give this a try. Correct me if I am wrong. The schema for the batch endpoint will be like:
// defined in typescript style
type Body = Array<BatchUpdate>
type BatchUpdate = {
operation: 'upsert' | 'delete' | 'set_payload' | 'overwrite_payload' | 'delete_payload' | 'clear_payload',
data: PointInsertOperations | SetPayload | DeletePayload | PointsSelector
}
PointInsertOperations is whatever is accepted by /collections/{collection_name}/points. Similar for SetPayload, DeletePayload, PointsSelector.
The DB will get an array of BatchUpdate and apply them one by one.
there is a catch, that OpenAPI may create anonymous objects if you use non-untagged enum in the API. I would like to avoid that, cause it tends to produce a pretty ugly client code
there is a catch, that OpenAPI may create anonymous objects if you use non-untagged enum in the API. I would like to avoid that, cause it tends to produce a pretty ugly client code
You want the OpenAPI schema to properly refer to the type using something like $ref: "#/components/schemas/PointRequest". This will be taken care of if I reuse the enums already defined. Am I right? I will have to define only one more new type which will be BatchUpdate.
@il3ven Are you still working on this, or could I give it a shot?
@BabaBert Please feel free give it a shot. My chances of submitting a PR looks thin. The issue turned out to be more complicated that I expected, mainly because I am unfamiliar with Rust.
From what I've seen so far it surely isn't the easiest task mostly because Actix doesn't seem to like Iterators that much. Though I'm only going to be able to work on this from next week onwards.
💡 @Jesse-Bakker submitted a pull request that claims the bounty. You can visit your org dashboard to reward.
🎉🎈 @Jesse-Bakker has been awarded $300! 🎈🎊
Hey @generall, shouldn't this be closed?