worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Results 16 worker-vllm issues
Sort by recently updated
recently updated
newest added

Since vLLM 0.4.1 added model_loader and did not added function. During docker building process model downloader module failed to import this function.

2024-05-23T09:58:01.432712734Z CUDA Version 12.1.0 2024-05-23T09:58:01.433425080Z 2024-05-23T09:58:01.433427258Z Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-05-23T09:58:01.434084437Z 2024-05-23T09:58:01.434087212Z This container image and its contents are governed by the...

Hi there, The current version of the download_model.py script does not work due to the empty `TENSORIZE_MODEL` env check on line 50. Once that is fixed, the `weight_utils` file in...

Im running a runpod serverless vLLM template with Llama 3 70B on 40GB GPU. One of the requests failed and I'm not completely sure what happened but the message asked...

Greetings! I just wanted to make a quick note that the documentation for worker-vllm and RunPod both don't seem to mention anything about vLLM supporting guided generation via Json schemas...