lkchen comments

Repositories
Issues
Comments

Results 12 comments of


                                            lkchen

[ V0 ][ sample ] improve sample performance when using guide decoding

https://github.com/vllm-project/vllm/pull/17084 removed sampler from model, this PR needs rebase. Let me see if I can help

[ V0 ][ sample ] improve sample performance when using guide decoding

Hi @cjsdurj , may I ask how to produce before throughput of 2tk/s and after throughput of 136 tk/s ? I'm using https://github.com/lk-chen/vllm/pull/2 on L40S, forcing **vLLM v0, model=Qwen/Qwen2.5-1.5B-Instruct, async...