worker-vllm feat: align version with vllm

feat: align version with vllm

Open wwydmanski opened this issue 6 months ago • 3 comments

Gemma-2 no longer requires flashinfer - in fact, newest version of vllm has a bug in its usage, which makes the LLM return wrong tokens.

This pull requests makes it possible to use the newest vLLM build with gemma-2 models in a serverless mode.

Aug 06 '24 12:08 wwydmanski