Ray Wan
Ray Wan
I'm trying to implement control vector into vllm codebase for mixtral model, but I was wondering where should I add the control vector to the layer. Should it be added...
Added lora adapters to gptbigcode Resolves #3011
as written in title
This draft pr is a work in progress aiming to add [cascade inference ](https://flashinfer.ai/2024/02/02/cascade-inference.html)to vllm. This is supposed to speedup inference when there are multiple requests that share the same...