worker-vllm issues

Since vLLM 0.4.1 added model_loader and did not added function. During docker building process model downloader module failed to import this function.

ArtyoMKos

Got some deprecation notice, might update these

2024-05-23T09:58:01.432712734Z CUDA Version 12.1.0 2024-05-23T09:58:01.433425080Z 2024-05-23T09:58:01.433427258Z Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-05-23T09:58:01.434084437Z 2024-05-23T09:58:01.434087212Z This container image and its contents are governed by the...

nerdylive123

Building Docker with model built in

3

Hi there, The current version of the download_model.py script does not work due to the empty `TENSORIZE_MODEL` env check on line 50. Once that is fixed, the `weight_utils` file in...

KDercksen

Runpod serverless vLLM with Llama 3 70B on 40GB GPU

2

Im running a runpod serverless vLLM template with Llama 3 70B on 40GB GPU. One of the requests failed and I'm not completely sure what happened but the message asked...

EdwardTheLegend

Update documentation to note support for extra parameters

1

Greetings! I just wanted to make a quick note that the documentation for worker-vllm and RunPod both don't seem to mention anything about vLLM supporting guided generation via Json schemas...

bryankruman

Documentation incorrect regarding boolean

Hey! Following the closing of #91 I noticed that the documentation still says that boolean environment variables can be substituted with 0 or 1. Looking at [the environment variable parsing](https://github.com/runpod-workers/worker-vllm/blob/main/src/engine_args.py#L15-L92),...

scriptcoded

MODEL_REVISION & TOKENIZER_REVISION: Both are needed to configure the revision

When someone wants to use a different revision of a model, they need to specify the revision. Looking at the README, it is not clear how to do that. My...

TimPietrusky

Bitsandbytes support

Hi there! vllm supports [bitsandbytes quantization](https://docs.vllm.ai/en/latest/quantization/bnb.html), but there is no bitsandbytes dependency in [requirements.txt](https://github.com/runpod-workers/worker-vllm/blob/main/builder/requirements.txt). Is there any plans to fix that?

ilyalasy

Support GGUF models

See https://github.com/vllm-project/vllm/issues/1002, https://github.com/vllm-project/vllm/pull/5191. Should be able to set `gguf` as `QUANTIZATION` envar, but we also need to provide exact quant. I'm thinking of some `MODEL_FILENAME` envar containing the exact filename...

vladfaust

worker-vllm
worker-vllm copied to clipboard

Metadata

[WIP] Testing Suite

ImputError prepare_hf_model_weights method

Got some deprecation notice, might update these

Building Docker with model built in

Runpod serverless vLLM with Llama 3 70B on 40GB GPU

Update documentation to note support for extra parameters

Documentation incorrect regarding boolean

MODEL_REVISION & TOKENIZER_REVISION: Both are needed to configure the revision

Bitsandbytes support

Support GGUF models

← Metadata

Owner

Metadata

worker-vllm worker-vllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

worker-vllm
worker-vllm copied to clipboard