Questions on CPU Overheads & KV Cache Aware Routing

Open ItamarGefen opened this issue 5 months ago • 0 comments

Hi, thanks for open-sourcing this project!

I have a couple of questions:

Regarding CPU overheads (e.g. scheduling, tokenization, etc) - while they’re mentioned in the documentation, from reading the code, it seems that a request can enter a batch immediately upon arrival. Does the simulator currently model these CPU-related delays? Also, the link in the docs appears to be broken.
Are there any plans to continue developing new features in the simulator, such as KV cache-aware routing (like what's in NVIDIA's Dynamo KV Cache Routing?

Thanks!

Jul 24 '25 13:07 ItamarGefen