vidur issues

Support for speculative decoding?

Does Vidur currently support speculative decoding?

Questions on CPU Overheads & KV Cache Aware Routing

Hi, thanks for open-sourcing this project! I have a couple of questions: 1) Regarding CPU overheads (e.g. scheduling, tokenization, etc) - while they’re mentioned in the documentation, from reading the...

ItamarGefen

Current vidur backend support

2

We’re attempting to reproduce the simulation results and observed that when comparing against **vLLM 0.9.1** benchmarks, the P50 latency differs by **700%**. May I ask if vLLM v1 is supported...

nba556677go

Support Dynamic Replica Adjustment at Runtime Without Service Interruption

1

Hi Vidur Team, We are researching auto-scaling solutions for large models and have found your simulator to be highly valuable for our work! However, the simulator currently only supports static...

mayuqing111

vidur
vidur copied to clipboard

Metadata

Support for speculative decoding?

Questions on CPU Overheads & KV Cache Aware Routing

Current vidur backend support

Support Dynamic Replica Adjustment at Runtime Without Service Interruption

← Metadata

Owner

Metadata

vidur vidur copied to clipboard

Metadata

Support for speculative decoding?

Questions on CPU Overheads & KV Cache Aware Routing

Current vidur backend support

Support Dynamic Replica Adjustment at Runtime Without Service Interruption

← Metadata

Owner

Metadata

vidur
vidur copied to clipboard