vllm issues

Support for Falcon-7B / 40B models

5

It would be great, if you can add support for Falcon models as well! Does it support onnx models today?

sujithjoseph

new model

/v1/embeddings please

1

when will /v1/embeddings API available? Thank you

yuhai-china

good first issue

feature request

Langchain passes `prompt` as a `list` instead of `str`

1

As mentioned in the title [this simple example](https://python.langchain.com/docs/get_started/quickstart#llms) passes a list instead of a str. Raw request: ![image](https://github.com/vllm-project/vllm/assets/47108366/197b6dc3-a5b0-49f5-9568-2739de1fbd93) Error Message: `INFO: 127.0.0.1:44226 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error...

beratcmn

good first issue

feature request

Add support for H100

Thanks for the repo! I can build the repo successfully on H100 machine. But when I run the benchmarks, it shows the error below: ``` FATAL: kernel `fmha_cutlassF_f16_aligned_64x128_rf_sm80` is for...

LiuXiaoxuanPKU

bug

Publish wheels with pre-built CUDA binaries

2

Currently, pip installing our package takes 5-10 minutes because our CUDA kernels are compiled on the user machine. For better UX, we should include pre-built CUDA binaries in our PyPI...

WoosukKwon

help wanted

Installation

GPTQ / Quantization support?

2

Will vLLM support 4-bit GPTQ models?

nikshepsvn

feature request

Adding support for encoder-decoder models, like T5 or BART

29

Will there be added support for encoder-decoder models, like T5 or BART? All of the currently supported models are decoder-only.

shermansiu

new model

Whisper support

31

Is support for Whisper on the roadmap? Something like https://github.com/ggerganov/whisper.cpp would be great.

gottlike

new model

Can vllm serving clients by using multiple model instances?

1

Based on the examples, vllm can launch a server with a single model instances. Can vllm serving clients by using multiple model instances? With multiple model instances, the sever will...

aoyulong

Would it be possible to support LoRA fine-tuned models?

6

How easy or difficult it would be to support LoRA fine-tuned models? Would it need big changes to the vLLM engine or is it something that can be done at...

asalaria-cisco

feature request

vllm
vllm copied to clipboard

Metadata

Support for Falcon-7B / 40B models

/v1/embeddings please

Langchain passes `prompt` as a `list` instead of `str`

Add support for H100

Publish wheels with pre-built CUDA binaries

GPTQ / Quantization support?

Adding support for encoder-decoder models, like T5 or BART

Whisper support

Can vllm serving clients by using multiple model instances?

Would it be possible to support LoRA fine-tuned models?

← Metadata

Owner

Metadata

vllm vllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

vllm
vllm copied to clipboard