[Feature] Terminate idle cluster
Search before asking
- [x] I had searched in the issues and found no similar feature requirement.
Description
Terminate idle cluster when the cluster doesn't have any running Ray jobs. Two possible implementations:
- Ray dashboard provides an endpoint for KubeRay to check whether the cluster is idle or not. If it is idle more than a threshold, terminate the cluster.
- Pros: the feature doesn't rely on Autoscaler
- Cons: KubeRay needs to communicate with Ray dashboard for RayCluster.
- Ray Autoscaler checks the idle status.
Use case
No response
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
As per my understanding, if instance/node bring up is responsibility of AutoScaler then the bring down should also be owned by the AutoScaler right ? Also when we look at the Architecture Ray stopper commands are being executed by AutoScaler itself.
It would be great to have a documentation on which component is responsible for what feature and probably not intermingle.
Hi, is this feature planned? It would be very nice. Dask-kubernetes has idleTimeout settings https://github.com/dask/dask-kubernetes/pull/672
@kevin85421 Could you assign this issue to me, thanks!
Very excited for this to finally be implemented!