Lu Fang

Results 87 comments of


                                            Lu Fang

Update to torch==2.6.0

Shall we merge https://github.com/vllm-project/vllm/pull/12393 first? cc: @youkaichao

[Feature]: Enable CUDA Graph without turn on torch.compile / Inductor for V1

cc: @zou3519

[Feature]: Enable CUDA Graph without turn on torch.compile / Inductor for V1

https://github.com/vllm-project/vllm/pull/16072 will support full CUDA graph.

fix: `vllm serve` on Apple silicon

I feel this is more an inductor issue, shall we just turn off the inductor by default if we detect it's non-GPU?

fix: `vllm serve` on Apple silicon

cc: @zou3519 thoughts?

[ROCm] Effort to reduce the number of environment variables in command line

Can we add a test plan?

[Bug fix] ROCm FlashAttention: add missing `full_scales` argument to Triton wrapper

rebase to trigger test?

[WIP] Add FlexAttention to V1

Btw, have we compared the perf with other attention backend, like FA3?

[Perf] Tunings for SM100 FP8 CUTLASS kernel

cc: @chenyang78 @drisspg

[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0

speculative decoding seems broken on trunk. rerun lora tests now.

‹
1
2
3
4
5
6
7
8
9
›