Lu Fang
Lu Fang
Shall we merge https://github.com/vllm-project/vllm/pull/12393 first? cc: @youkaichao
https://github.com/vllm-project/vllm/pull/16072 will support full CUDA graph.
I feel this is more an inductor issue, shall we just turn off the inductor by default if we detect it's non-GPU?
cc: @zou3519 thoughts?
Can we add a test plan?
rebase to trigger test?
Btw, have we compared the perf with other attention backend, like FA3?
cc: @chenyang78 @drisspg
speculative decoding seems broken on trunk. rerun lora tests now.