vidur
vidur copied to clipboard
A large-scale simulation framework for LLM inference
Does Vidur currently support speculative decoding?
Hi, thanks for open-sourcing this project! I have a couple of questions: 1) Regarding CPU overheads (e.g. scheduling, tokenization, etc) - while they’re mentioned in the documentation, from reading the...
We’re attempting to reproduce the simulation results and observed that when comparing against **vLLM 0.9.1** benchmarks, the P50 latency differs by **700%**. May I ask if vLLM v1 is supported...
Hi Vidur Team, We are researching auto-scaling solutions for large models and have found your simulator to be highly valuable for our work! However, the simulator currently only supports static...