langchain
langchain copied to clipboard
[Vectorstore] Added async gRPC methods to Qdrant vectorstore
Async gRPC methods to Qdrant vectorstore
Changes:
-
Async methods to add: aadd_texts, asimilarity_search, asimilarity_search_with_score, amax_marginal_relevance_search_with_score
-
Helper methods to add: _document_from_scored_point_grpc, _document_from_scored_point_grpc, _build_condition_grpc, _qdrant_filter_from_dict_grpc
In addition to rest, Qdrant supports the gRPC protocol, which allows calling asynchronous methods on the client. gRPC is an open source remote procedure call (RPC) framework developed by Google. gRPC uses a data serialization and RPC protocol called Protocol Buffers to enable efficient communication between clients and servers.
By adding Qdrant's native asynchronous methods, we can solve the event loop blocking issue that occurs when using synchronous methods.
Since these asynchronous methods only work on remote clients, not local, I've made it so that attempting to use them in a local environment (memory, disk) will result in a NotImplementedError.
I was going to create a unit test functions, but I didn't add it for now because it requires a real Remote server to test the method, but it's working fine for now in my project.
@c0sogi Thanks for handling that! Do you see a chance not to repeat the batching process but unify it for sync and async? Can we work it out together?
@c0sogi Thanks for handling that! Do you see a chance not to repeat the batching process but unify it for sync and async? Can we work it out together?
Sure. Reducing code duplication in the current sync and async seems like a good idea.
Since grpc doesn't use the term Batch, and since putting a rest.Batch in the points parameter of the upsert method and putting a List of Points directly have equivalent effects on the client side, it would be nice if we could just stick to using the method I used (PointStruct for id, vector, payload in zip).
I think if we define a helper method and the client is an instance of QdrantRemote and the return type is different depending on whether the client's _prefer_grpc
value is true or false, that's fine, but what do you think?
@c0sogi It's a good idea to switch to the list of points instead of the batch, to make things consistent.
I'm unsure if we should throw NotImplementedError
for async in local env. The local mode is not intended to be used in production environments, but it's rather designed for quick experiments. How about displaying a warning, but allowing for an event loop blocking?
@c0sogi It's a good idea to switch to the list of points instead of the batch, to make things consistent.
I'm unsure if we should throw
NotImplementedError
for async in local env. The local mode is not intended to be used in production environments, but it's rather designed for quick experiments. How about displaying a warning, but allowing for an event loop blocking?
The qdrant-client library throws a NotImplementedError
if it itself uses a Client of a type other than QdrantRemote
.
If it is a QdrantLocal
, that is, if the operation takes place in memory or on disk, it does not support asynchronous operations.
However, in the case of QdrantLocal, it is likely possible to perform asynchronous operations without blocking the event loop as follows.
- if the type of the QdrantClient is QdrantLocal, print a small warning to the user that gRPC asynchronous operations are not supported.
- get an already running event loop with
asyncio.get_running_loop()
. - Have the sync method run in a new threadpool with something like
loop.run_in_executor(None, self.some_sync_method, ...)
to avoid the eventloop blocking effect.
Async API was introduced in #7704.
Async API was introduced in #7704.
Awesome. Thanks for finishing this out.