qdrant icon indicating copy to clipboard operation
qdrant copied to clipboard

Batch update API

Open generall opened this issue 2 years ago • 11 comments

Is your feature request related to a problem? Please describe.

Case 1: Allow to update multiple payloads of different vectors in one request. Case 2: Allow to overwrite single (but different for each point) payload value, for multiple points

Describe the solution you'd like

Create a batch update API, which can receive a list of different update requests:

  • upsert
  • delete
  • set_payload
  • overwrite_payload
  • delete_payload
  • clear_payload

For REST:

PUT /collections/<name>/points/batch

And similar for gRPC.

Describe alternatives you've considered Create batch request for individual type of APIs

Additional context Acceptance criteria:

  • New api should have integration tests
  • Should re-use existing data structures as much as possible
  • API schema should be properly generated (there are scripts and CI for that)
  • This changes should not affect any existing functionality, be 100% backward compatible

generall avatar May 16 '23 08:05 generall

/bounty $300

generall avatar May 16 '23 08:05 generall

~~💎 $300 bounty created by generall~~ ~~🙋 If you start working on this, comment /attempt #1904 to notify everyone~~ ~~👉 To claim this bounty, submit a pull request that includes the text /claim #1904 somewhere in its body~~ ~~📝 Before proceeding, please make sure you can receive payouts in your country~~ ~~💵 Payment arrives in your account 2-5 days after the bounty is rewarded~~ ~~💯 You keep 100% of the bounty award~~ ~~🙏 Thank you for contributing to qdrant/qdrant!~~ ~~

~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~
AttemptStarted (GMT+0)Solution
🟢 @Jesse-Bakker#1951
~~

algora-pbc[bot] avatar May 16 '23 08:05 algora-pbc[bot]

What is the expected behavior if the result sets of update requests overlap? For example, if two upserts include the same point, or have filters that apply to the same point?

Jesse-Bakker avatar May 16 '23 08:05 Jesse-Bakker

What is the expected behavior if the result sets of update requests overlap? For example, if two upserts include the same point, or have filters that apply to the same point?

I think we can just apply updates one by one

generall avatar May 16 '23 08:05 generall

Hey, I will like to give this a try. Correct me if I am wrong. The schema for the batch endpoint will be like:

// defined in typescript style
type Body = Array<BatchUpdate>

type BatchUpdate = {
  operation: 'upsert' | 'delete' | 'set_payload' | 'overwrite_payload' | 'delete_payload' | 'clear_payload',
  data: PointInsertOperations | SetPayload | DeletePayload | PointsSelector
}

PointInsertOperations is whatever is accepted by /collections/{collection_name}/points. Similar for SetPayload, DeletePayload, PointsSelector.

The DB will get an array of BatchUpdate and apply them one by one.

il3ven avatar May 16 '23 20:05 il3ven

there is a catch, that OpenAPI may create anonymous objects if you use non-untagged enum in the API. I would like to avoid that, cause it tends to produce a pretty ugly client code

generall avatar May 16 '23 20:05 generall

there is a catch, that OpenAPI may create anonymous objects if you use non-untagged enum in the API. I would like to avoid that, cause it tends to produce a pretty ugly client code

You want the OpenAPI schema to properly refer to the type using something like $ref: "#/components/schemas/PointRequest". This will be taken care of if I reuse the enums already defined. Am I right? I will have to define only one more new type which will be BatchUpdate.

il3ven avatar May 16 '23 21:05 il3ven

@il3ven Are you still working on this, or could I give it a shot?

baboon25 avatar May 22 '23 19:05 baboon25

@BabaBert Please feel free give it a shot. My chances of submitting a PR looks thin. The issue turned out to be more complicated that I expected, mainly because I am unfamiliar with Rust.

il3ven avatar May 22 '23 19:05 il3ven

From what I've seen so far it surely isn't the easiest task mostly because Actix doesn't seem to like Iterators that much. Though I'm only going to be able to work on this from next week onwards.

baboon25 avatar May 22 '23 20:05 baboon25

💡 @Jesse-Bakker submitted a pull request that claims the bounty. You can visit your org dashboard to reward.

algora-pbc[bot] avatar May 23 '23 10:05 algora-pbc[bot]

🎉🎈 @Jesse-Bakker has been awarded $300! 🎈🎊

algora-pbc[bot] avatar Aug 07 '23 14:08 algora-pbc[bot]

Hey @generall, shouldn't this be closed?

wesleymatosdev avatar Nov 12 '23 16:11 wesleymatosdev