vidur icon indicating copy to clipboard operation
vidur copied to clipboard

Support Dynamic Replica Adjustment at Runtime Without Service Interruption

Open mayuqing111 opened this issue 6 months ago • 1 comments

Hi Vidur Team,

We are researching auto-scaling solutions for large models and have found your simulator to be highly valuable for our work!

However, the simulator currently only supports static configuration upon startup. We are interested in knowing: Is it feasible to modify it to support dynamic replica count adjustment during runtime? Does your team have plans to implement such a feature? Alternatively, could you provide some guidance? This would allow us to observe the impact of different replica counts on various metrics without interrupting the service.

mayuqing111 avatar Jun 20 '25 07:06 mayuqing111

Hi @mayuqing111, it is wonderful to know that Vidur is providing great value in your work.

It is indeed feasible to support dynamic replica count adjustment. We don't have plans to implement this as of writing but yesterday we released a big PR #56, one of the features is metrics from different replicas are captured separately. To implement this PR, some rough thoughts: Add ReplicaStartEvent and a ReplicaEndEvent. At replica start, create a new replica object and add it in the list of replicas. The GlobalScheduler should consider this replica from now on. At replica end, you'll want to flush all requests from the replica, so either implement support for aborted requests or (harder) migrate requests back to the global scheduler. You'll also need to decide where to place the logic of generating these events, what factor will it consider etc. I highly recommend working on the latest master after #56 for this feature.

nitinkedia7 avatar Jun 26 '25 12:06 nitinkedia7