vllm icon indicating copy to clipboard operation
vllm copied to clipboard

A high-throughput and memory-efficient inference and serving engine for LLMs

Results 2816 vllm issues
Sort by recently updated
recently updated
newest added

Will it be possible to add support for GPTJ please? Any plans for this?

new model

Do you know if support for this is planned? I may be interested in writing the custom metal kernels.

On ubuntu 20.04, Python 3.10, pip 23.1.2 Issue persisting with Python 3.8 and pip 21.2.4 ``` Collecting vllm Using cached vllm-0.1.0.tar.gz (83 kB) Running command pip subprocess to install build...

Installation

https://arxiv.org/abs/1911.02150 For example, StarCoder uses MQA to speed up inference. How does PagedAttention compare to Multi-Query Attention? Are they compatible?

new model

BLOOM is an open-source LLM developed by BigScience. The BLOOM models have achieved high rankings in HuggingFace downloads. It'd be great to have these models in our catalog.

new model

I successfully installed vLLM in WSL2, when I was trying to run the sample code, I got error info like this: ``` from vllm import LLM, SamplingParams prompts = [...

bug

Maybe not too urgent, but would be nice to have echo in the OpenAI interface, this can facilitate scoring (e.g., QA dataset)

good first issue
feature request

I get the following error when I try to install vllm on WSL Ubuntu. The main take away is - ```Building wheel for vllm (pyproject.toml) did not run successfully.``` Full...

Installation

I saw all are wrappers around vllm, how to integrate hf and see out-of-box boost from my existing model?