langchain icon indicating copy to clipboard operation
langchain copied to clipboard

[Vectorstore] Added async gRPC methods to Qdrant vectorstore

Open c0sogi opened this issue 1 year ago • 2 comments

Async gRPC methods to Qdrant vectorstore

Changes:

  • Async methods to add: aadd_texts, asimilarity_search, asimilarity_search_with_score, amax_marginal_relevance_search_with_score

  • Helper methods to add: _document_from_scored_point_grpc, _document_from_scored_point_grpc, _build_condition_grpc, _qdrant_filter_from_dict_grpc

In addition to rest, Qdrant supports the gRPC protocol, which allows calling asynchronous methods on the client. gRPC is an open source remote procedure call (RPC) framework developed by Google. gRPC uses a data serialization and RPC protocol called Protocol Buffers to enable efficient communication between clients and servers.

By adding Qdrant's native asynchronous methods, we can solve the event loop blocking issue that occurs when using synchronous methods.

Since these asynchronous methods only work on remote clients, not local, I've made it so that attempting to use them in a local environment (memory, disk) will result in a NotImplementedError.

I was going to create a unit test functions, but I didn't add it for now because it requires a real Remote server to test the method, but it's working fine for now in my project.

c0sogi avatar Jun 01 '23 09:06 c0sogi

@c0sogi Thanks for handling that! Do you see a chance not to repeat the batching process but unify it for sync and async? Can we work it out together?

kacperlukawski avatar Jun 01 '23 13:06 kacperlukawski

@c0sogi Thanks for handling that! Do you see a chance not to repeat the batching process but unify it for sync and async? Can we work it out together?

Sure. Reducing code duplication in the current sync and async seems like a good idea.

Since grpc doesn't use the term Batch, and since putting a rest.Batch in the points parameter of the upsert method and putting a List of Points directly have equivalent effects on the client side, it would be nice if we could just stick to using the method I used (PointStruct for id, vector, payload in zip).

I think if we define a helper method and the client is an instance of QdrantRemote and the return type is different depending on whether the client's _prefer_grpc value is true or false, that's fine, but what do you think?

c0sogi avatar Jun 01 '23 15:06 c0sogi

@c0sogi It's a good idea to switch to the list of points instead of the batch, to make things consistent.

I'm unsure if we should throw NotImplementedError for async in local env. The local mode is not intended to be used in production environments, but it's rather designed for quick experiments. How about displaying a warning, but allowing for an event loop blocking?

kacperlukawski avatar Jun 05 '23 14:06 kacperlukawski

@c0sogi It's a good idea to switch to the list of points instead of the batch, to make things consistent.

I'm unsure if we should throw NotImplementedError for async in local env. The local mode is not intended to be used in production environments, but it's rather designed for quick experiments. How about displaying a warning, but allowing for an event loop blocking?

The qdrant-client library throws a NotImplementedError if it itself uses a Client of a type other than QdrantRemote. If it is a QdrantLocal, that is, if the operation takes place in memory or on disk, it does not support asynchronous operations. However, in the case of QdrantLocal, it is likely possible to perform asynchronous operations without blocking the event loop as follows.

  1. if the type of the QdrantClient is QdrantLocal, print a small warning to the user that gRPC asynchronous operations are not supported.
  2. get an already running event loop with asyncio.get_running_loop().
  3. Have the sync method run in a new threadpool with something like loop.run_in_executor(None, self.some_sync_method, ...) to avoid the eventloop blocking effect.

c0sogi avatar Jun 06 '23 14:06 c0sogi

Async API was introduced in #7704.

kacperlukawski avatar Jul 17 '23 07:07 kacperlukawski

Async API was introduced in #7704.

Awesome. Thanks for finishing this out.

c0sogi avatar Jul 17 '23 11:07 c0sogi