worker-vllm Compatibility with deployment on a Pod?

Hi there, I am using this for deployment on Runpod serverless. But it's getting quite expensive b/c I have a lot of requests, so I'm looking into switching over to a dedicated Runpod pod running vLLM. But it seems that this repo makes some nice improvements to vLLM which I'd like to keep using.

So a few questsions:

Is this compatible with a Pod?
Are there any tutorials or examples showing how to use it with a Pod?

Thank you!

Jan 28 '25 03:01 jparismorgan

Thanks for sharing the feedback. This worker currently supports the serverless. If you want to deploy on pods, it should be straight forward vllm deployment, let me know if I can help you anywhere in the process.

Jan 28 '25 20:01 pandyamarut

Thanks, yes, I can do a vLLM deployment on a Pod easy enough. But then I don't get the benefits that this repo provides, i.e. token batching in responses (i.e. DEFAULT_BATCH_SIZE and DEFAULT_BATCH_SIZE_GROWTH_FACTOR) and help finding the tokenizer. Is there a way we could get those benefits when using a Pod instead of Serverless?

Jan 30 '25 17:01 jparismorgan