worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

feat: align version with vllm

Open wwydmanski opened this issue 6 months ago • 3 comments

Gemma-2 no longer requires flashinfer - in fact, newest version of vllm has a bug in its usage, which makes the LLM return wrong tokens.

This pull requests makes it possible to use the newest vLLM build with gemma-2 models in a serverless mode.

wwydmanski avatar Aug 06 '24 12:08 wwydmanski