Qdrant Semantic Caching
Title
Semantic Caching with Qdrant Vector database
Relevant issues
Fixes #4963
Type
🆕 New Feature 📖 Documentation ✅ Test
Changes
litellm/caching.py litellm/utils.py docs/my-website/docs/caching/all_caches.md
Testing - Screenshot of new tests passing locally
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| litellm | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Aug 19, 2024 7:01pm |
picking this up this week - hoping to have this / some version of this merged by end of week.
Sure! Let me know if there is anything I can assist with.
Thank you for the feedback, @ishaan-jaff. I'll go ahead and add the necessary tests to the PR.
Regarding the inclusion of qdrant_client as a dependency, we can interact with Qdrant via its REST API (please refer to the link below). We can utilize the requests library in Python for this purpose. Let me know if this approach works for you, and I'll update the PR accordingly.
https://api.qdrant.tech/api-reference
@haadirakhangi yes please use their REST API
- don't use requests
- use our httpx handler litellm has a handler as part of the lib use
client = (
_get_async_httpx_client()
) # Create a new client if none provided
example https://github.com/BerriAI/litellm/blob/b1bed459b4efc75f07a7d06b5260deb1a8ce187b/litellm/llms/sagemaker.py#L466
Thank you, @ishaan-jaff. I'll proceed with using the REST API and the httpx handler from litellm.
Hi @ishaan-jaff,
I've implemented Qdrant Semantic Caching using their REST API and added the 'PUT' method to http_handler.py, as it was required for certain actions with Qdrant. You can refer to their API documentation here: Qdrant API Reference.
I’ve tested the implementation on both Qdrant Cloud cluster instances and local cluster instances, and everything is functioning as expected. Could you please guide me on how to properly add tests for this PR?
hi @haadirakhangi looks awesome thanks for the work on this
Could you please guide me on how to properly add tests for this PR?
what guide are you looking for? Check out test_caching.py -> that should be a good starting point
- just add testing for
litellm.acompletion+litellm.acompletion with streamingfor this PR @haadirakhangi
can you share a screenshot of both tests working for you locally ?
@ishaan-jaff Added the two mentioned tests in the test_caching.py file. Please go through it and let me know if there are any changes!
can you share a screenshot of both tests working for you locally ?
Sure! Here are the screenshots you asked for:
Testing with acompletion:
Testing with acompletion + streaming:
Also, I will adjust the testing as said.
merging this, I will take care of:
- making sure testing passes on ci/cd
- the issues @krrishdholakia pointed out on message["content"]
- Adding docs on using with litellm proxy
- removing the need for
host_type
cc @haadirakhangi thanks for the amazing work 👑
Thanks @ishaan-jaff, I'm glad to contribute! 😄