Cody Yu
Cody Yu
> @comaniac, would you mind find someone to review this PR? This is an important model. If reviewer feels reviewing kernel takes too long, maybe requesting more numerical tests and...
I understand the approach but get a bit confused about the code changes. Specifically how the "disabling multi-step when n>1" handled in this PR? Seems like this PR only raises...
Just confirm: Is this ready for review?
sg. Please let me know when it's ready for review and I'll prioritize it.
Sounds good to me. It would be even more better if we could allow `flashinfer` to install `flashinfer_aot`; otherwise most users would probably suffer from long compile time (and require...
Thanks for pointing out. I just changed all required places. Meanwhile, yeah we do need xFormers and vllm-flash-attn...
Pending xformers to release a version against torch 2.3.1. Tracking issue https://github.com/facebookresearch/xformers/issues/1052 Closing #4509
CI passed, but need to double check manually whether FlashInfer supports torch 2.3.1.
- Checked that FlashInfer works with PyTorch 2.3.1. - Performance benchmark (https://buildkite.com/vllm/performance-benchmark/builds/4569#) shows no regression compared to https://simon-mo-workspace.observablehq.cloud/vllm-dashboard-v0/perf This PR should be good to go. cc @Yard1 @robertgshaw2-neuralmagic @WoosukKwon @simon-mo...
cc @ruisearch42 @richardliaw