k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Expose device UUIDs to node label

Open xiongzubiao opened this issue 1 year ago • 3 comments

Closes #1015

xiongzubiao avatar Jan 09 '25 18:01 xiongzubiao

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Jan 09 '25 18:01 copy-pr-bot[bot]

@xiongzubiao could you please provide information on how these labels will be used?

elezar avatar Jan 10 '25 12:01 elezar

@xiongzubiao could you please provide information on how these labels will be used?

@elezar, we want to provide some sort of visualization to user. User can click each GPU to check its properties, status, and metrics. The device UUID is the natural choice for indexing. There are other ways to get UUID, but it is most straightforward to get it from node labels, because it is a part of node properties.

There is another use case mentioned in #1015: scheduling pod to a specific GPU using node label matching.

xiongzubiao avatar Jan 10 '25 19:01 xiongzubiao