Compatibility with deployment on a Pod?
Hi there, I am using this for deployment on Runpod serverless. But it's getting quite expensive b/c I have a lot of requests, so I'm looking into switching over to a dedicated Runpod pod running vLLM. But it seems that this repo makes some nice improvements to vLLM which I'd like to keep using.
So a few questsions:
- Is this compatible with a Pod?
- Are there any tutorials or examples showing how to use it with a Pod?
Thank you!
Thanks for sharing the feedback. This worker currently supports the serverless. If you want to deploy on pods, it should be straight forward vllm deployment, let me know if I can help you anywhere in the process.
Thanks, yes, I can do a vLLM deployment on a Pod easy enough. But then I don't get the benefits that this repo provides, i.e. token batching in responses (i.e. DEFAULT_BATCH_SIZE and DEFAULT_BATCH_SIZE_GROWTH_FACTOR) and help finding the tokenizer. Is there a way we could get those benefits when using a Pod instead of Serverless?