Ray Wan

Results 4 issues of Ray Wan

I'm trying to implement control vector into vllm codebase for mixtral model, but I was wondering where should I add the control vector to the layer. Should it be added...

Added lora adapters to gptbigcode Resolves #3011

This draft pr is a work in progress aiming to add [cascade inference ](https://flashinfer.ai/2024/02/02/cascade-inference.html)to vllm. This is supposed to speedup inference when there are multiple requests that share the same...