Cade Daniel comments

Results 121 comments of


                                            Cade Daniel

add spec infer related into prometheus metrics.

thanks & sorry this slipped. I might have time tomorrow to finish review. cc @LiuXiaoxuanPKU and @comaniac who might have bandwidth.

[Speculative decoding] [Help wanted] [Performance] Optimize draft-model speculative decoding

Thanks everyone for the help! We hit a 45% latency reduction. Big thanks to @sroy745 @alexm-neuralmagic @comaniac @wooyeonlee0 @zifeitong @LiuXiaoxuanPKU @rkooo567 @ruisearch42 and everyone else who has helped reduced vLLM...

[Performance] [Speculative decoding]: Support draft model on different tensor-parallel size than target model

yep, my policy is to review the PRs in the order that they're initially ready for review. go ahead @wooyeonlee0 .

[Train] TorchTrainer does not free all GPUs on shutdown

The repro script will help say definitively what the issue is. @MahdiNazemi could you say how you launched the job? E.g. was it from the head node directly, or submitted...

[Train] TorchTrainer does not free all GPUs on shutdown

> also can you check when the GPU leaks, whether the ray worker process exited? One way to check this is to run `nvidia-smi` on the host using the 8...

[Bug]: Shape error encountered in speculative decoding when `enable_lora=True`

Unfortunately vLLM speculative decoding does not yet support LoRA inference.

[Bug]: Shape error encountered in speculative decoding when `enable_lora=True`

Can you share your use case? We'd love to see this supported but no bandwidth from me to take it on.

[Bug]: Shape error encountered in speculative decoding when `enable_lora=True`

Send me an email at cade @ anyscale.com

[Bug]: Shape error encountered in speculative decoding when `enable_lora=True`

There is no work planned from myself. I created an issue with more details if you want to work on it @kevmo314 . https://github.com/vllm-project/vllm/issues/6912

[Core] Identify Mac M1/M2 GPUs as valid GPUs

cc @jjyao