OpenLLM issues

feat: support volta architecture GPUs for the vLLM backend

### Feature request It would be great if OpenLLM supported pre-Ampere architecture Cuda devices. In my case, I'm looking at the volta architecture. The [README currently indicates](https://github.com/bentoml/OpenLLM?tab=readme-ov-file#-runtime-implementations) that an Ampere-architecture...

K-Mistele

bug: start chatglm-6b locally err

### Describe the bug I executed "TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b" and model was loaded successfully, swagger was available, but got error using v1/chat/compelitioins, ``` Traceback (most recent call last): File...

zhangxinyang97

How to load a model offline

1

When I execute openllm start for the first time, the model will be downloaded to the local and then started, but it seems that it will also send a network...

fawpcmhgung162

bug: not generate eos_token when using qwen7B-chat

### Describe the bug when using qwen-7b-chat,and openai completion api ,i have given stop tokens like ["",""],but it always stop when reach the max length limit.i have check the generation...

qiufengyuyi

When attempting to add the CohereForAI/aya-101 model using CohereForAI's Aya model, an error occurred during the loading process.

### Describe the bug When attempting to add the CohereForAI/aya-101 model using CohereForAI's Aya model, an error occurred during the loading process. bentoml.exceptions.BentoMLException: Failed to load bento model because it...

yasserkh2

Runtime error about concurrency

I tested with Apache Benchmark to test how many api calls Openllm can handle at the same time. I set concurrent user 4, requests num 40(10 requests per one). For...

meanwo

bug: Error in sending post request for bentoml container service

1

### Describe the bug After building openllm to generate service and runner, then run the docker image as following: **Server:** ``` $ docker run --rm --gpus all -p 3000:3000 -it...

hahmad2008

bug: Requests with "use_beam_search: true" fail with an unclear exception message.

### Describe the bug I am using this llm config in the json request: "llm_config": { "num_beams": 5, "use_beam_search": true } and I am getting an unclear exception: chatx-gdch-openllm-86d68dd84f-r8png RuntimeError:...

yan-virin

bug: TypeError attribute name must be string, not 'NoneType'

3

### Describe the bug This happens when i run openllm. I am using a fresh ubutnu install with fresh venv running. ### To reproduce 1.) Create venv 2.) pip install...

CHesketh76

Can't pass workers_per_resource to the bentoml container

2

### Describe the bug I have a machine with two GPUs, I run the model with openllm start command and everything went well. `CUDA_VISIBLE_DEVICES=0,1 TRANSFORMERS_OFFLINE=1 openllm start mistral --model-id mymodel...

hahmad2008

OpenLLM
OpenLLM copied to clipboard

Metadata

feat: support volta architecture GPUs for the vLLM backend

bug: start chatglm-6b locally err

How to load a model offline

bug: not generate eos_token when using qwen7B-chat

When attempting to add the CohereForAI/aya-101 model using CohereForAI's Aya model, an error occurred during the loading process.

Runtime error about concurrency

bug: Error in sending post request for bentoml container service

bug: Requests with "use_beam_search: true" fail with an unclear exception message.

bug: TypeError attribute name must be string, not 'NoneType'

Can't pass workers_per_resource to the bentoml container

← Metadata

Owner

Metadata

OpenLLM OpenLLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

OpenLLM
OpenLLM copied to clipboard