litellm icon indicating copy to clipboard operation
litellm copied to clipboard

🎅 I WISH LITELLM HAD...

Open krrishdholakia opened this issue 1 year ago • 215 comments

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

krrishdholakia avatar Sep 13 '23 19:09 krrishdholakia

[LiteLLM Client] Add new models via UI

Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though.

krrishdholakia avatar Sep 13 '23 19:09 krrishdholakia

User / API Access Management

Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it.

krrishdholakia avatar Sep 13 '23 19:09 krrishdholakia

cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here.

krrishdholakia avatar Sep 13 '23 19:09 krrishdholakia

[Spend Dashboard] View analytics for spend per llm and per user

  • This allows me to see what my most expensive llms are and what users are using litellm heavily

ishaan-jaff avatar Sep 13 '23 19:09 ishaan-jaff

Auto select the best LLM for a given task

If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light

ishaan-jaff avatar Sep 13 '23 19:09 ishaan-jaff

Integration with NLP Cloud

Pipboyguy avatar Sep 13 '23 21:09 Pipboyguy

That's awesome @Pipboyguy - dm'ing on linkedin to learn more!

krrishdholakia avatar Sep 13 '23 22:09 krrishdholakia

@ishaan-jaff check out this truncate param in the cohere api

This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner. Screenshot 2023-09-14 at 10 54 50 AM

I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well).

krrishdholakia avatar Sep 14 '23 17:09 krrishdholakia

Option to use Inference API so we can use any model from Hugging Face 🤗

haseeb-heaven avatar Sep 17 '23 00:09 haseeb-heaven

@haseeb-heaven you can already do this - https://github.com/BerriAI/litellm/blob/a63784d5b376c22d6203fed62f26c3ec5f92e5d1/litellm/llms/huggingface_restapi.py#L53

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

krrishdholakia avatar Sep 17 '23 00:09 krrishdholakia

@haseeb-heaven you can already do this -

https://github.com/BerriAI/litellm/blob/a63784d5b376c22d6203fed62f26c3ec5f92e5d1/litellm/llms/huggingface_restapi.py#L53

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

Wow great thanks its working. Nice feature

haseeb-heaven avatar Sep 17 '23 00:09 haseeb-heaven

Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private.

smig23 avatar Sep 18 '23 02:09 smig23

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

ishaan-jaff avatar Sep 18 '23 16:09 ishaan-jaff

finetuning wrapper for openai, huggingface etc.

shauryr avatar Sep 18 '23 17:09 shauryr

@shauryr i created an issue to track this - feel free to add any missing details here

krrishdholakia avatar Sep 18 '23 18:09 krrishdholakia

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future.

smig23 avatar Sep 18 '23 18:09 smig23

I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well?

https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

ranjancse26 avatar Sep 19 '23 05:09 ranjancse26

I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc.

Sorry, based on the below documentation, it seems there's only support for the Open AI embedding.

https://docs.litellm.ai/docs/embedding/supported_embedding

ranjancse26 avatar Sep 19 '23 07:09 ranjancse26

I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models.

https://docs.cerebrium.ai/cerebrium/prebuilt-models

ranjancse26 avatar Sep 19 '23 09:09 ranjancse26

@ranjancse26 what models on cerebrium do you want to use with LiteLLM ?

ishaan-jaff avatar Sep 19 '23 16:09 ishaan-jaff

@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing.

image

ranjancse26 avatar Sep 19 '23 16:09 ranjancse26

@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals

ishaan-jaff avatar Sep 19 '23 18:09 ishaan-jaff

I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere.

https://docs.cohere.com/reference/post_chat

ranjancse26 avatar Sep 21 '23 00:09 ranjancse26

I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support.

ranjancse26 avatar Sep 21 '23 00:09 ranjancse26

I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.

  • Text classification
  • Text summarization
  • Text translation
  • Text generation
  • Code generation

ranjancse26 avatar Sep 21 '23 00:09 ranjancse26

I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.

  • Pinecone
  • Qdrant
  • Weaviate
  • Milvus
  • DuckDB
  • Sqlite

ranjancse26 avatar Sep 21 '23 00:09 ranjancse26

I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation.

https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser https://colab.research.google.com/drive/1Q9VvVzjZJja7_y2Ls8qBkE_NApbLiqly?usp=sharing

ranjancse26 avatar Sep 21 '23 00:09 ranjancse26

Hey @ranjancse26 we have support for both llama index and langchain. Which have great vector db support. Any reason why those don't work for you?

krrishdholakia avatar Sep 21 '23 01:09 krrishdholakia

@krrishdholakia @ishaan-jaff Could you please mention detailed references to the vector db usages with code samples on how one could leverage with Litellm?

ranjancse26 avatar Sep 21 '23 03:09 ranjancse26

Here's a sample code @ranjancse26

from litellm import completion 

prompt = # prompt injected with data from vector db retrieval 

messages = [{"role": "user", "content": prompt}]

response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)

Is there some nuance here i'm missing? Our vector db implementations usually involved stuffing the prompt with some additional context.

krrishdholakia avatar Sep 21 '23 04:09 krrishdholakia