vllm issues

Add support for GPTJ

3

Will it be possible to add support for GPTJ please? Any plans for this?

ri938

new model

MPS backend (Metal kernels) support (Apple, M1, M2)

Do you know if support for this is planned? I may be interested in writing the custom metal kernels.

dosier

Ubuntu pip installation issue

13

On ubuntu 20.04, Python 3.10, pip 23.1.2 Issue persisting with Python 3.8 and pip 21.2.4 ``` Collecting vllm Using cached vllm-0.1.0.tar.gz (83 kB) Running command pip subprocess to install build...

ElizabethCappon

Installation

How does this compare to MQA (multi-query attention)?

4

https://arxiv.org/abs/1911.02150 For example, StarCoder uses MQA to speed up inference. How does PagedAttention compare to Multi-Query Attention? Are they compatible?

xpl

new model

Support BLOOM

8

BLOOM is an open-source LLM developed by BigScience. The BLOOM models have achieved high rankings in HuggingFace downloads. It'd be great to have these models in our catalog.

WoosukKwon

new model

CUDA error: out of memory

11

I successfully installed vLLM in WSL2, when I was trying to run the sample code, I got error info like this: ``` from vllm import LLM, SamplingParams prompts = [...

SunixLiu

bug

Implementing Echo in OpenAI endpoint

1

Maybe not too urgent, but would be nice to have echo in the OpenAI interface, this can facilitate scoring (e.g., QA dataset)

andreamad8

good first issue

feature request

WSL Ubuntu installation issue

3

I get the following error when I try to install vllm on WSL Ubuntu. The main take away is - ```Building wheel for vllm (pyproject.toml) did not run successfully.``` Full...

armsp