worker-vllm
worker-vllm copied to clipboard
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Since vLLM 0.4.1 added model_loader and did not added function. During docker building process model downloader module failed to import this function.
2024-05-23T09:58:01.432712734Z CUDA Version 12.1.0 2024-05-23T09:58:01.433425080Z 2024-05-23T09:58:01.433427258Z Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-05-23T09:58:01.434084437Z 2024-05-23T09:58:01.434087212Z This container image and its contents are governed by the...
Hi there, The current version of the download_model.py script does not work due to the empty `TENSORIZE_MODEL` env check on line 50. Once that is fixed, the `weight_utils` file in...
Im running a runpod serverless vLLM template with Llama 3 70B on 40GB GPU. One of the requests failed and I'm not completely sure what happened but the message asked...
Greetings! I just wanted to make a quick note that the documentation for worker-vllm and RunPod both don't seem to mention anything about vLLM supporting guided generation via Json schemas...