worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Results 41 worker-vllm issues
Sort by recently updated
recently updated
newest added

Since vLLM 0.4.1 added model_loader and did not added function. During docker building process model downloader module failed to import this function.

2024-05-23T09:58:01.432712734Z CUDA Version 12.1.0 2024-05-23T09:58:01.433425080Z 2024-05-23T09:58:01.433427258Z Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-05-23T09:58:01.434084437Z 2024-05-23T09:58:01.434087212Z This container image and its contents are governed by the...

Hi there, The current version of the download_model.py script does not work due to the empty `TENSORIZE_MODEL` env check on line 50. Once that is fixed, the `weight_utils` file in...

Im running a runpod serverless vLLM template with Llama 3 70B on 40GB GPU. One of the requests failed and I'm not completely sure what happened but the message asked...

Greetings! I just wanted to make a quick note that the documentation for worker-vllm and RunPod both don't seem to mention anything about vLLM supporting guided generation via Json schemas...

Hey! Following the closing of #91 I noticed that the documentation still says that boolean environment variables can be substituted with 0 or 1. Looking at [the environment variable parsing](https://github.com/runpod-workers/worker-vllm/blob/main/src/engine_args.py#L15-L92),...

When someone wants to use a different revision of a model, they need to specify the revision. Looking at the README, it is not clear how to do that. My...

Hi there! vllm supports [bitsandbytes quantization](https://docs.vllm.ai/en/latest/quantization/bnb.html), but there is no bitsandbytes dependency in [requirements.txt](https://github.com/runpod-workers/worker-vllm/blob/main/builder/requirements.txt). Is there any plans to fix that?

See https://github.com/vllm-project/vllm/issues/1002, https://github.com/vllm-project/vllm/pull/5191. Should be able to set `gguf` as `QUANTIZATION` envar, but we also need to provide exact quant. I'm thinking of some `MODEL_FILENAME` envar containing the exact filename...