Luka Govedič comments

Results 53 comments of


                                            Luka Govedič

[RFC]: A Graph Optimization System in vLLM using torch.compile

This has been fully implemented, read more in this [blog post](https://blog.vllm.ai/2025/08/20/torch-compile.html)!

[WIP] Add FlexAttention to V1

Yeah that's true. I was considering the use case where someone is experimenting with attention out of source for existing models but you're right that new model definitions are the...

[Feature]: Full cudagraph support for MLA attention backend with DeepSeek MTP(Speculative decode)

Yes, we're planning to overhaul cudagraph capture, dispatching, and replay in #20059. See my latest comment for design and Lucas's comment for spec-decoding support. But yeah the infra in that...

[Feature]: Full cudagraph support for MLA attention backend with DeepSeek MTP(Speculative decode)

> Will max_query_len > 1 full cudagraph capture support on another PR? Yes, @fhl2000 is working on it. > Or mla attention(TritonMLA or FlashMLA) will use piecewise cudagraph or no...

[Bugfix]Fix a conditional to not check zero value

@gmagogsfm could you just elaborate what the repro for causing this issue is?

[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0

@zou3519 want to retry? CI seems more stable rn

[torch.compile] integration with compilation control

> this is what I did for experiments. do you have any ideas on how to expose the control to users? What about one environment variable that serves as a...

[Build] Cython compilation support fix

What is still missing here? @WoosukKwon @mgoin @gshtras

[Build] Cython compilation support fix

@gshtras can you re-merge main? I think that should resolve the CI issue

[Build] Cython compilation support fix

@WoosukKwon @robertgshaw2-redhat could we automerge?