kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Feature] Add/remove instances from an active Ray Cluster

Open HuangLED opened this issue 3 years ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

With a ray cluster up and running, it would be nice that we could add more instances to the cluster (or remove inactive instances).

An enhancement to this feature would be to provide a min/max in the spec, and the ray cluster automatically allocate/deallocates based on active work-load.

@Jeffwan

Use case

From time to time, we may run into the situation where we didn't allocate enough instances for a ray cluster, therefore would like more instances without start over everything.

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

HuangLED avatar Oct 16 '21 00:10 HuangLED

Technically, this has been supported by adding/removing workgroup by modifying RayCluster custom resource. However, remove operation manually is kind of dangerous because operator is not aware of actors running on those nominated nodes to be deleted.

This feature is reasonable and I think people use it add GPU machine group or group with different labels or groups etc. Let's check if we have enough documentation for this feature.

/cc @chenk008 @akanso I think you have similar usage?

Jeffwan avatar Oct 19 '21 18:10 Jeffwan

yes today we can add/remove workers from the worker groups. We can remove random workers by changing the replicas or remove specific ones by specifying the pod name in the scaleStrategy and decrementing the replicas atribute.

akanso avatar Oct 19 '21 18:10 akanso