Micah Williamson
Micah Williamson
Thanks for looking into this. This does appear to improve perf, but does not give us the full throughput from before https://github.com/vllm-project/vllm/issues/26320 ``` VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 llm bench throughput --model /models/Llama-4-Maverick-17B-128E-Instruct-FP8/ -tp...
> Thanks for looking into this. This does appear to improve perf, but does not give us the full throughput from before #26320 > > ``` > VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 llm bench...
Hi @Ubospica I see all of the checks have passed, could this get merged now? Thanks!