worker-vllm issues

Meta-Llama-3.1-8B support

I'm trying to build `Llama 3.1` and `Llama 3.1 Instruct` but the build always fails (latest main or v1.2.0). These models are not supported yet? `Llama 3` and `Llama 3...

klipach

Issue: Update VLLM to Version .5.0++, and a few suggestions

13

### Description 1. 🌟 **Upgrade VLLM**: We need to rocket [VLLM to version 0.5.0++](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1) or beyond! 🚀 2. 🤖 **Tensorize Awesomeness**: The `tensorize` feature is like giving VLLM a turbo...

nerdylive123

feat: align version with vllm

3

Gemma-2 no longer requires `flashinfer` - in fact, newest version of vllm has a bug in its usage, which makes the LLM return wrong tokens. This pull requests makes it...

wwydmanski

Support for tools / tool_choice="auto" in OpenAI-compatible API

23

The purpose of this issue is to provide full support for `tools` and `tool_choice="auto"` in worker-vllm. ## ToDo - [ ] vLLM only supports `tool_choice="some_tool_name"` and `tool_choice="none"`, but hopefully soon...

TimPietrusky

ValueError: rope_scaling must be a dictionary with two fields, type and factor

2

``` Traceback (most recent call last): 2024-08-01T21:29:17.880522621Z File "/src/handler.py", line 6, in 2024-08-01T21:29:17.880527641Z vllm_engine = vLLMEngine() 2024-08-01T21:29:17.880533331Z File "/src/engine.py", line 25, in __init__ 2024-08-01T21:29:17.880543011Z self.llm = self._initialize_llm() if engine is...

omar93939

Optimize Dockerfile with UV for faster dependency installation

1

## Description This PR introduces UV (https://github.com/astral-sh/uv) as a replacement for pip in the Dockerfile. > An extremely fast Python package installer and resolver, written in Rust. Designed as a...

rachfop

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407

5

Hello you all keep scratching my head why sometimes I can deploy all on list but stuff I find having issues anyways this is my logs just trying to use...

TheMindExpansionNetwork

[feat] ability to set max_num_seqs

1

The memory usage of vLLM's KV cache is directly proportional to the batch size of the model. vLLM's default is 256 but many users don't need nearly that many. For...

kalocide

A new version of VLLM has been released

1

A new version(0.5.1) of VLLM has been released, could you please update it to work with runpod serverless? https://github.com/vllm-project/vllm/releases

d4rk6un

Gemma-2 is not available in this docker image.

Please update for Gemma-2.

codingchild2424

worker-vllm
worker-vllm copied to clipboard

Metadata

Meta-Llama-3.1-8B support

Issue: Update VLLM to Version .5.0++, and a few suggestions

feat: align version with vllm

Support for tools / tool_choice="auto" in OpenAI-compatible API

ValueError: rope_scaling must be a dictionary with two fields, type and factor

Optimize Dockerfile with UV for faster dependency installation

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407

[feat] ability to set max_num_seqs

A new version of VLLM has been released

Gemma-2 is not available in this docker image.

← Metadata

Owner

Metadata

worker-vllm worker-vllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

worker-vllm
worker-vllm copied to clipboard