OpenLLM
OpenLLM copied to clipboard
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
### Describe the bug It seem's like some error under stream mode that broke the generation result ### To reproduce 1. start openllm and use some chat model (in the...
ideally, when the bento running on bentocloud is on serverless, the client should be able to retry connection until the pod is alive
### Feature request The new implementation provides options to be more explicit wrt to exceptions on client side What we want to do is to also add both exception handling...
We have a fine-tuned Llama 7B model that we tried to built using openllm into a docker container. We ran into issues with how big the final docker image ended...
### Describe the bug Hi, When trying to query the `meta-llama/Llama-2-7b-chat-hf` with a simple query (i.e. 'Hello'), the query failed with a timeout. Just to be clear, i launched in...
 I tried use OpenLLM with my merged Llama model, and it print two lines of "Special tokens have been added in the vocabulary, make sure the...
### Feature request Hey there :) I have deployed an OpenLLM on a managed service that provides HTTPS out of the box: ```bash curl -X 'POST' \ 'https://themodel.url/v1/generate' \ -H...
### Feature request Add a parameters to load lora on request ### Motivation _No response_ ### Other _No response_
### Feature request support InternLM https://github.com/InternLM/InternLM https://huggingface.co/internlm ### Motivation _No response_ ### Other _No response_
### Describe the bug I try to run one of TheBloke's quantized models on an A100 40GB. It is not one of the most recent models ### To reproduce ```...