litellm icon indicating copy to clipboard operation
litellm copied to clipboard

Qdrant Semantic Caching

Open haadirakhangi opened this issue 1 year ago • 6 comments

Title

Semantic Caching with Qdrant Vector database

Relevant issues

Fixes #4963

Type

🆕 New Feature 📖 Documentation ✅ Test

Changes

litellm/caching.py litellm/utils.py docs/my-website/docs/caching/all_caches.md

Testing - Screenshot of new tests passing locally

image image

haadirakhangi avatar Aug 02 '24 16:08 haadirakhangi

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 19, 2024 7:01pm

vercel[bot] avatar Aug 02 '24 16:08 vercel[bot]

picking this up this week - hoping to have this / some version of this merged by end of week.

krrishdholakia avatar Aug 12 '24 20:08 krrishdholakia

Sure! Let me know if there is anything I can assist with.

haadirakhangi avatar Aug 13 '24 07:08 haadirakhangi

Thank you for the feedback, @ishaan-jaff. I'll go ahead and add the necessary tests to the PR.

Regarding the inclusion of qdrant_client as a dependency, we can interact with Qdrant via its REST API (please refer to the link below). We can utilize the requests library in Python for this purpose. Let me know if this approach works for you, and I'll update the PR accordingly.

https://api.qdrant.tech/api-reference

haadirakhangi avatar Aug 17 '24 08:08 haadirakhangi

@haadirakhangi yes please use their REST API

  • don't use requests
  • use our httpx handler litellm has a handler as part of the lib use
client = (
                    _get_async_httpx_client()
                )  # Create a new client if none provided

example https://github.com/BerriAI/litellm/blob/b1bed459b4efc75f07a7d06b5260deb1a8ce187b/litellm/llms/sagemaker.py#L466

ishaan-jaff avatar Aug 17 '24 15:08 ishaan-jaff

Thank you, @ishaan-jaff. I'll proceed with using the REST API and the httpx handler from litellm.

haadirakhangi avatar Aug 17 '24 15:08 haadirakhangi

Hi @ishaan-jaff,

I've implemented Qdrant Semantic Caching using their REST API and added the 'PUT' method to http_handler.py, as it was required for certain actions with Qdrant. You can refer to their API documentation here: Qdrant API Reference.

I’ve tested the implementation on both Qdrant Cloud cluster instances and local cluster instances, and everything is functioning as expected. Could you please guide me on how to properly add tests for this PR?

haadirakhangi avatar Aug 19 '24 15:08 haadirakhangi

hi @haadirakhangi looks awesome thanks for the work on this

ishaan-jaff avatar Aug 19 '24 17:08 ishaan-jaff

Could you please guide me on how to properly add tests for this PR?

what guide are you looking for? Check out test_caching.py -> that should be a good starting point

  • just add testing for litellm.acompletion + litellm.acompletion with streaming for this PR @haadirakhangi

ishaan-jaff avatar Aug 19 '24 17:08 ishaan-jaff

can you share a screenshot of both tests working for you locally ?

ishaan-jaff avatar Aug 19 '24 19:08 ishaan-jaff

@ishaan-jaff Added the two mentioned tests in the test_caching.py file. Please go through it and let me know if there are any changes!

haadirakhangi avatar Aug 19 '24 19:08 haadirakhangi

can you share a screenshot of both tests working for you locally ?

Sure! Here are the screenshots you asked for:

Testing with acompletion: image

Testing with acompletion + streaming: image

Also, I will adjust the testing as said.

haadirakhangi avatar Aug 19 '24 19:08 haadirakhangi

merging this, I will take care of:

  1. making sure testing passes on ci/cd
  2. the issues @krrishdholakia pointed out on message["content"]
  3. Adding docs on using with litellm proxy
  4. removing the need for host_type

ishaan-jaff avatar Aug 21 '24 15:08 ishaan-jaff

cc @haadirakhangi thanks for the amazing work 👑

ishaan-jaff avatar Aug 21 '24 15:08 ishaan-jaff

Thanks @ishaan-jaff, I'm glad to contribute! 😄

haadirakhangi avatar Aug 21 '24 16:08 haadirakhangi