ray icon indicating copy to clipboard operation
ray copied to clipboard

[Serve] Allow customize Ray Serve auto scaler scale down logic

Open manhld0206 opened this issue 1 month ago • 3 comments

Description

Now, Ray Serve auto scaler logic is fixed. Allow user to customize the scale down logic based on: worker node, deployment, replica to fit different kinds of business logic.

Use case

For our use case, we want to utilize both on-demand and spot instance nodes. We want to keep a minimum number of replicas always stay on on-demand workers. By customizing the scale down logic, we can ensure this by scaling down replicas on spot instances node first

manhld0206 avatar Nov 25 '25 03:11 manhld0206

Have you taken a stab at https://docs.ray.io/en/latest/serve/advanced-guides/advanced-autoscaling.html#custom-autoscaling-policies yet?

ok-scale avatar Nov 25 '25 16:11 ok-scale

@ok-scale Yes. But it only supports returning the desired number of replicas, not which replica to scale down/terminate

manhld0206 avatar Nov 26 '25 01:11 manhld0206

there is currently an effort to support label selector in serve https://github.com/ray-project/ray/pull/57694. When that happens, maybe this feature can be achieved in the following way

  1. on your deployment, set label_selector = on-demand and fallback = spot
  2. in your k8s set a maximum limit on number of on-demand instances
  3. which means beyond your acceptable threshold, new replicas will be scheduled on spot instances.
  4. downscaling would also evict replicas from spot first.

Not a clean API though, but wdyt?

abrarsheikh avatar Dec 10 '25 06:12 abrarsheikh

@abrarsheikh That's perfect!

manhld0206 avatar Dec 11 '25 02:12 manhld0206