canopy icon indicating copy to clipboard operation
canopy copied to clipboard

[Feature] vLLM Integration

Open jamescalam opened this issue 1 year ago • 4 comments

Is this your first time submitting a feature request?

  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing functionality

Describe the feature

It would be incredible if we could run local canopy with Mixtral 8x7b — which (afaik) would need GGUF quantized Mixtral via vLLM. This would open us up to integrations with things like formal grammars too (which again, afaik need local models, I don't think any API solutions accept it)

Holiday season is just around the corner and not sure if you guys got me anything so just putting this out there as an idea

Describe alternatives you've considered

No response

Who will this benefit?

The world, but primarily open LLM devs. Would probably be less production use, but I'm sure having this and being able to run for free (Pinecone free tier + local LLM) would push forward more devs building with canopy imo

Are you interested in contributing this feature?

maybe yes

Anything else?

Requires around 30GB of memory using GGUF quantized Mixtral https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF — this would fit on Mac M1/M2 chips

jamescalam avatar Dec 14 '23 22:12 jamescalam

This would also get me to adopt canopy. I have set my base_url and api_key to a local instance to an LLM, but I am not able to continue - even though the LLM uses the Open AI module.

I get a: TypeError: 'NoneType' object is not subscriptable error; however, when I use the real base url and api key, it works just fine.

Would love to see this adopted.

rachfop avatar Dec 29 '23 15:12 rachfop

I have set my base_url and api_key to a local instance to an LLM, but I am not able to continue - even though the LLM uses the Open AI module.

I get a: TypeError: 'NoneType' object is not subscriptable error; however, when I use the real base url and api key, it works just fine.

@rachfop can you please elaborate or provide a repro code?
For any model that follows the OpenAI APIs, this should actually work out of the box.

igiloh-pinecone avatar Jan 15 '24 13:01 igiloh-pinecone

Totally think this should be considered! For two reasons:

  1. One shouldn't just rely on paid models where local, fine-tuned models will perform just as good or better.
  2. OpenAI embeddings are not great which can affect the accuracy of your RAG setup https://huggingface.co/spaces/mteb/leaderboard

Also perhaps the use of Ollama will be good as this creates a local server for the LLM.

Thoughts?

pedrocr83 avatar Mar 15 '24 11:03 pedrocr83

ollama uses openai api

cognitivetech avatar Aug 25 '24 20:08 cognitivetech