worker-vllm
worker-vllm copied to clipboard
feat: align version with vllm
Gemma-2 no longer requires flashinfer
- in fact, newest version of vllm has a bug in its usage, which makes the LLM return wrong tokens.
This pull requests makes it possible to use the newest vLLM build with gemma-2 models in a serverless mode.