OpenLLM
OpenLLM copied to clipboard
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
### Feature request It would be great if OpenLLM supported pre-Ampere architecture Cuda devices. In my case, I'm looking at the volta architecture. The [README currently indicates](https://github.com/bentoml/OpenLLM?tab=readme-ov-file#-runtime-implementations) that an Ampere-architecture...
### Describe the bug I executed "TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b" and model was loaded successfully, swagger was available, but got error using v1/chat/compelitioins, ``` Traceback (most recent call last): File...
When I execute openllm start for the first time, the model will be downloaded to the local and then started, but it seems that it will also send a network...
### Describe the bug when using qwen-7b-chat,and openai completion api ,i have given stop tokens like ["",""],but it always stop when reach the max length limit.i have check the generation...
### Describe the bug When attempting to add the CohereForAI/aya-101 model using CohereForAI's Aya model, an error occurred during the loading process. bentoml.exceptions.BentoMLException: Failed to load bento model because it...
I tested with Apache Benchmark to test how many api calls Openllm can handle at the same time. I set concurrent user 4, requests num 40(10 requests per one). For...
### Describe the bug After building openllm to generate service and runner, then run the docker image as following: **Server:** ``` $ docker run --rm --gpus all -p 3000:3000 -it...
### Describe the bug I am using this llm config in the json request: "llm_config": { "num_beams": 5, "use_beam_search": true } and I am getting an unclear exception: chatx-gdch-openllm-86d68dd84f-r8png RuntimeError:...
### Describe the bug This happens when i run openllm. I am using a fresh ubutnu install with fresh venv running. ### To reproduce 1.) Create venv 2.) pip install...
### Describe the bug I have a machine with two GPUs, I run the model with openllm start command and everything went well. `CUDA_VISIBLE_DEVICES=0,1 TRANSFORMERS_OFFLINE=1 openllm start mistral --model-id mymodel...