Lu Fang

Results 87 comments of Lu Fang

Shall we merge https://github.com/vllm-project/vllm/pull/12393 first? cc: @youkaichao

https://github.com/vllm-project/vllm/pull/16072 will support full CUDA graph.

I feel this is more an inductor issue, shall we just turn off the inductor by default if we detect it's non-GPU?

Btw, have we compared the perf with other attention backend, like FA3?

speculative decoding seems broken on trunk. rerun lora tests now.