vllm issues

GPTBigCodeForCasualLM support doesn't work

1

I tried to use vllm on my finetuned model from starcoder, but its seems not supported from the official package (?) In the README.md is said to be supported. ```...

davide221

Allow send list of str for the Prompt on openai demo endpoint /v1/completions

1

The langchain implementation sends the prompt as an array of strings to the /v1/completions endpoint. With this change, it is possible to use a simple string or an array of...

ironpinguin

Support Multiple Models

22

- Allow user to specify multiple models to download when loading server - Allow user to switch between models - Allow user to load multiple models on the cluster (nice...

aldrinc

feature request

I'm trying to run this project with the following Dockerfile: ```Dockerfile FROM nvcr.io/nvidia/pytorch:22.12-py3 RUN pip uninstall torch -y WORKDIR /workspace COPY /inference/vllm /workspace/inference/vllm WORKDIR /workspace/inference/vllm RUN pip install -e ....

raj-khare

The async_llm_engine may have resource leak when using stream

3

![image](https://github.com/vllm-project/vllm/assets/98044045/37400e69-a644-4b87-9e7d-fe1ba61eda3c) look at this, the output here continues for half a hour and never stops but nothing is generated. New request is pendding.

metacryptom

bug

Prompt size limits? It keeps hanging with prompts longer than 120 tokens

6

Are there any prompt size limits? It seems that using more than 120 words make the model unresponsive. Check the following case. In the first try I used 112 words...

ktolias

Support LoRA adapter

21

hi guys, We found that infer with vllm can greatly improve performance! But we need to use LoRA(`peft`) in inference. We also found that the community has a strong demand...

mymusise

ray OOM in tensor parallel

23

In my case , I can deploy the vllm service on single GPU. but when I use multi gpu, I meet the ray OOM error. Could you please help solve...

liulfy

bug

[WIP] Add Falcon

2

Only works for Falcon-7B for now. The Falcon-40B model generates garbage outputs. Needs debugging.

WoosukKwon

How to load a local model file?

10

I want to load a local model which has the same file with the files downloaded from huggingface. However, right now this repository seems to only support load from website.

BUAADreamer

vllm
vllm copied to clipboard

Metadata

GPTBigCodeForCasualLM support doesn't work

Allow send list of str for the Prompt on openai demo endpoint /v1/completions

Support Multiple Models

Can not run vllm with docker

The async_llm_engine may have resource leak when using stream

Prompt size limits? It keeps hanging with prompts longer than 120 tokens

Support LoRA adapter

ray OOM in tensor parallel

[WIP] Add Falcon

How to load a local model file?

← Metadata

Owner

Metadata

vllm vllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

vllm
vllm copied to clipboard