Cade Daniel

Results 121 comments of Cade Daniel

thanks & sorry this slipped. I might have time tomorrow to finish review. cc @LiuXiaoxuanPKU and @comaniac who might have bandwidth.

Thanks everyone for the help! We hit a 45% latency reduction. Big thanks to @sroy745 @alexm-neuralmagic @comaniac @wooyeonlee0 @zifeitong @LiuXiaoxuanPKU @rkooo567 @ruisearch42 and everyone else who has helped reduced vLLM...

yep, my policy is to review the PRs in the order that they're initially ready for review. go ahead @wooyeonlee0 .

The repro script will help say definitively what the issue is. @MahdiNazemi could you say how you launched the job? E.g. was it from the head node directly, or submitted...

> also can you check when the GPU leaks, whether the ray worker process exited? One way to check this is to run `nvidia-smi` on the host using the 8...

Unfortunately vLLM speculative decoding does not yet support LoRA inference.

Can you share your use case? We'd love to see this supported but no bandwidth from me to take it on.

There is no work planned from myself. I created an issue with more details if you want to work on it @kevmo314 . https://github.com/vllm-project/vllm/issues/6912