litellm Qdrant Semantic Caching

Title

Semantic Caching with Qdrant Vector database

Relevant issues

Fixes #4963

Type

🆕 New Feature 📖 Documentation ✅ Test

Changes

litellm/caching.py litellm/utils.py docs/my-website/docs/caching/all_caches.md

Testing - Screenshot of new tests passing locally

Aug 02 '24 16:08 haadirakhangi

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 19, 2024 7:01pm

Aug 02 '24 16:08 vercel[bot]

picking this up this week - hoping to have this / some version of this merged by end of week.

Aug 12 '24 20:08 krrishdholakia

Sure! Let me know if there is anything I can assist with.

Aug 13 '24 07:08 haadirakhangi

Thank you for the feedback, @ishaan-jaff. I'll go ahead and add the necessary tests to the PR.

Regarding the inclusion of qdrant_client as a dependency, we can interact with Qdrant via its REST API (please refer to the link below). We can utilize the requests library in Python for this purpose. Let me know if this approach works for you, and I'll update the PR accordingly.

https://api.qdrant.tech/api-reference

Aug 17 '24 08:08 haadirakhangi

@haadirakhangi yes please use their REST API

don't use requests
use our httpx handler litellm has a handler as part of the lib use

client = (
                    _get_async_httpx_client()
                )  # Create a new client if none provided

example https://github.com/BerriAI/litellm/blob/b1bed459b4efc75f07a7d06b5260deb1a8ce187b/litellm/llms/sagemaker.py#L466

Aug 17 '24 15:08 ishaan-jaff

Thank you, @ishaan-jaff. I'll proceed with using the REST API and the httpx handler from litellm.

Aug 17 '24 15:08 haadirakhangi

Hi @ishaan-jaff,

I've implemented Qdrant Semantic Caching using their REST API and added the 'PUT' method to http_handler.py, as it was required for certain actions with Qdrant. You can refer to their API documentation here: Qdrant API Reference.

I’ve tested the implementation on both Qdrant Cloud cluster instances and local cluster instances, and everything is functioning as expected. Could you please guide me on how to properly add tests for this PR?

Aug 19 '24 15:08 haadirakhangi

hi @haadirakhangi looks awesome thanks for the work on this

Aug 19 '24 17:08 ishaan-jaff

Could you please guide me on how to properly add tests for this PR?

what guide are you looking for? Check out test_caching.py -> that should be a good starting point

just add testing for litellm.acompletion + litellm.acompletion with streaming for this PR @haadirakhangi

Aug 19 '24 17:08 ishaan-jaff

can you share a screenshot of both tests working for you locally ?

Aug 19 '24 19:08 ishaan-jaff

@ishaan-jaff Added the two mentioned tests in the test_caching.py file. Please go through it and let me know if there are any changes!

Aug 19 '24 19:08 haadirakhangi

can you share a screenshot of both tests working for you locally ?

Sure! Here are the screenshots you asked for:

Testing with acompletion:

Testing with acompletion + streaming:

Also, I will adjust the testing as said.

Aug 19 '24 19:08 haadirakhangi

merging this, I will take care of:

making sure testing passes on ci/cd
the issues @krrishdholakia pointed out on message["content"]
Adding docs on using with litellm proxy
removing the need for host_type

Aug 21 '24 15:08 ishaan-jaff

cc @haadirakhangi thanks for the amazing work 👑

Aug 21 '24 15:08 ishaan-jaff

Thanks @ishaan-jaff, I'm glad to contribute! 😄

Aug 21 '24 16:08 haadirakhangi