ray icon indicating copy to clipboard operation
ray copied to clipboard

[Core] Dynamic Node Labeling and Resource Registration

Open Irvingwangjr opened this issue 1 year ago • 3 comments

Description

right now, Ray provides node labeling and resource registration when calling ‘ray start --label/--custom-resource‘ or using RAY_OVERRIDE_RESOURCES/RAY_OVERRIDE_LABELS env var. As the Ray Cluster's lifecycle become longer, we hope we can label the node dynamic, and apply nodeLabelSelector to dispatch the actors or tasks

Use case

No response

Irvingwangjr avatar May 16 '24 02:05 Irvingwangjr

What's your use case for dynamic labels? I think today you can create a new node with new labels, and shut down the old node.

rynewang avatar May 20 '24 22:05 rynewang

yeah, if the workload is submitted using RayJob, using an ephemeral Ray Cluster, we can add the custom label at starting. But if we have a long-running Ray Cluster, that won't work.

Irvingwangjr avatar May 21 '24 02:05 Irvingwangjr

if we want to implement this, this can be achieved by using RaySyncer? Syncer will trigger NodeManager, do ConsumeSyncMessage -> UpdateResourceUsage -> UpdateNode -> ClusterResourceManager.AddOrUpdateNode. eventually update NodeResources data Structure (which contains labels)

Irvingwangjr avatar May 25 '24 09:05 Irvingwangjr