worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Results 16 worker-vllm issues
Sort by recently updated
recently updated
newest added

It seems there was a typo in the Dockerfile, preventing `$MODEL_REVISON` from ever being set since the build-arg was called `$MODEL_REVISON` but the ENV instruction tried to read a non-existing...

Hello everyone, I would like to update the vLLM version to v0.4.1 in order to get access to LLAMA3 but i don't know how modify the fork runpod/vllm-fork-for-sls-worker. Could you...

Any errors caused by the payload cause the instance to hang in an error state indefinitely. You have to manually terminate the instance or you'll rack up a hefty bill...

I'm getting a `BadRequestError` when I try to test the vllm worker locally. I'm running my handler locally for testing, using `MODEL_NAME=/models/stablelm-3b-4e1t python3 -u /src/handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0`,...

I am looking to record input and output to the vLLM. I could put an HTTP proxy in front and capture the traffic, or modify your handler. Rather than make...

I am getting error saying it cannot load Tokenizers for some models like Yarn Mistral/Llama-2 models. Is there any reason why?

Thank you for this awesome repo 💯 . While building a custom revision, I noticed this typo. An incorrect model (main) gets downloaded in `download_model.py`

Using a model that does not exist returns HTTP status 200, but the error message is in the JSON

Any update on when this feature will be available? Thanks