Micah Williamson comments

Repositories
Issues
Comments

Results 3 comments of


                                            Micah Williamson

[Optimization] Early return in `_iter_placeholders`

Thanks for looking into this. This does appear to improve perf, but does not give us the full throughput from before https://github.com/vllm-project/vllm/issues/26320 ``` VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 llm bench throughput --model /models/Llama-4-Maverick-17B-128E-Instruct-FP8/ -tp...

[Optimization] Early return in `_iter_placeholders`

> Thanks for looking into this. This does appear to improve perf, but does not give us the full throughput from before #26320 > > ``` > VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 llm bench...

Fix warp_size in triton kernel for AMD GPUs

Hi @Ubospica I see all of the checks have passed, could this get merged now? Thanks!