vllm
vllm copied to clipboard
[V1] Optimize rejection sampler
🚀 The feature, motivation and pitch
The current V1 rejection sampler is not optimized enough, taking unnecessary overheads. In my benchmarks, this takes 10-25% of the overall running time. We should profile & optimize it.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.