Luka Govedič
Luka Govedič
This has been fully implemented, read more in this [blog post](https://blog.vllm.ai/2025/08/20/torch-compile.html)!
Yeah that's true. I was considering the use case where someone is experimenting with attention out of source for existing models but you're right that new model definitions are the...
Yes, we're planning to overhaul cudagraph capture, dispatching, and replay in #20059. See my latest comment for design and Lucas's comment for spec-decoding support. But yeah the infra in that...
> Will max_query_len > 1 full cudagraph capture support on another PR? Yes, @fhl2000 is working on it. > Or mla attention(TritonMLA or FlashMLA) will use piecewise cudagraph or no...
@gmagogsfm could you just elaborate what the repro for causing this issue is?
@zou3519 want to retry? CI seems more stable rn
> this is what I did for experiments. do you have any ideas on how to expose the control to users? What about one environment variable that serves as a...
What is still missing here? @WoosukKwon @mgoin @gshtras
@gshtras can you re-merge main? I think that should resolve the CI issue
@WoosukKwon @robertgshaw2-redhat could we automerge?