kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Feature] Terminate idle cluster

Open kevin85421 opened this issue 10 months ago • 4 comments

Search before asking

  • [x] I had searched in the issues and found no similar feature requirement.

Description

Terminate idle cluster when the cluster doesn't have any running Ray jobs. Two possible implementations:

  • Ray dashboard provides an endpoint for KubeRay to check whether the cluster is idle or not. If it is idle more than a threshold, terminate the cluster.
    • Pros: the feature doesn't rely on Autoscaler
    • Cons: KubeRay needs to communicate with Ray dashboard for RayCluster.
  • Ray Autoscaler checks the idle status.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

kevin85421 avatar Feb 10 '25 21:02 kevin85421

As per my understanding, if instance/node bring up is responsibility of AutoScaler then the bring down should also be owned by the AutoScaler right ? Also when we look at the Architecture Ray stopper commands are being executed by AutoScaler itself.

It would be great to have a documentation on which component is responsible for what feature and probably not intermingle.

bhks avatar Apr 11 '25 18:04 bhks

Hi, is this feature planned? It would be very nice. Dask-kubernetes has idleTimeout settings https://github.com/dask/dask-kubernetes/pull/672

vladidobro avatar Aug 04 '25 20:08 vladidobro

@kevin85421 Could you assign this issue to me, thanks!

owenowenisme avatar Sep 03 '25 03:09 owenowenisme

Very excited for this to finally be implemented!

danielgafni avatar Oct 10 '25 08:10 danielgafni