OpenLLM issues

bug: Unexpected token generation under /generate_stream (stream)

1

### Describe the bug It seem's like some error under stream mode that broke the generation result ### To reproduce 1. start openllm and use some chat model (in the...

p208p2002

feat(client): support wait time for serverless startup time

ideally, when the bento running on bentocloud is on serverless, the client should be able to retry connection until the pod is alive

aarnphm

feat: structured exception for openllm-client

### Feature request The new implementation provides options to be more explicit wrt to exceptions on client side What we want to do is to also add both exception handling...

aarnphm

Is there any way to make images smaller?

2

We have a fine-tuned Llama 7B model that we tried to built using openllm into a docker container. We ran into issues with how big the final docker image ended...

martinmr

bug: OpenLLM query in WSL failed with a timeout

6

### Describe the bug Hi, When trying to query the `meta-llama/Llama-2-7b-chat-hf` with a simple query (i.e. 'Hello'), the query failed with a timeout. Just to be clear, i launched in...

vvvlll93

output "killed" with nothing else

2

![Screenshot from 2023-10-30 17-15-38](https://github.com/bentoml/OpenLLM/assets/108270936/a8413a19-6174-4bc0-a245-612db7b7fc5f) I tried use OpenLLM with my merged Llama model, and it print two lines of "Special tokens have been added in the vocabulary, make sure the...

Songjc0511

feat: HTTPS support

1

### Feature request Hey there :) I have deployed an OpenLLM on a managed service that provides HTTPS out of the box: ```bash curl -X 'POST' \ 'https://themodel.url/v1/generate' \ -H...

ColinLeverger

feat: LoRA loading per request

18

### Feature request Add a parameters to load lora on request ### Motivation _No response_ ### Other _No response_

aarnphm

feat: support InternLM

1

### Feature request support InternLM https://github.com/InternLM/InternLM https://huggingface.co/internlm ### Motivation _No response_ ### Other _No response_

vansin

bug: can't load GPTQ quantized model

2

### Describe the bug I try to run one of TheBloke's quantized models on an A100 40GB. It is not one of the most recent models ### To reproduce ```...

BEpresent

OpenLLM
OpenLLM copied to clipboard

Metadata

bug: Unexpected token generation under /generate_stream (stream)

feat(client): support wait time for serverless startup time

feat: structured exception for openllm-client

Is there any way to make images smaller?

bug: OpenLLM query in WSL failed with a timeout

output "killed" with nothing else

feat: HTTPS support

feat: LoRA loading per request

feat: support InternLM

bug: can't load GPTQ quantized model

← Metadata

Owner

Metadata

OpenLLM OpenLLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

OpenLLM
OpenLLM copied to clipboard