k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

upload gpu topology info to node annotation

Open lengrongfu opened this issue 2 years ago • 12 comments

1. Issue or feature description

we find current GPU select algorithm is besteffort_policy, we hope upload this node gpu topology info, when having multi gpu node, kube-schedule can select the best globally.

lengrongfu avatar Dec 06 '23 06:12 lengrongfu

/assign

lengrongfu avatar Dec 06 '23 06:12 lengrongfu

@kerthcet

lengrongfu avatar Dec 08 '23 07:12 lengrongfu

Should this issue be put under https://github.com/NVIDIA/gpu-feature-discovery?

kerthcet avatar Dec 26 '23 07:12 kerthcet

This is how we expose the GPU topo matrix now: [[-1, 20, 20, 20], [20, -1, 20, 20], [20, 20, -1, 20], [20, 20, 20, -1]], generally leverage the definations at https://github.com/NVIDIA/go-gpuallocator/blob/b0577847cf04c3e928488dfe90830a2c5a01706b/internal/links/device.go#L31-L57

cc @ArangoGutierrez @elezar @klueska Although we hope to go forward with DRA further in the future, a lot of users still stay at the old world with device plugin. I can help with this if needed. Thanks.

kerthcet avatar Feb 27 '24 07:02 kerthcet

Further more, hope to expose the GPU usage for wise scheduling as well but seems NFD/GFD reports at intervals, 60s by default, not quite fit here. Any suggestions, what we do today is report via device plugin self.

kerthcet avatar Feb 27 '24 07:02 kerthcet

Should this issue be put under https://github.com/NVIDIA/gpu-feature-discovery?

We are in the process of migrating GPU Feature discovery to this repository to streamline our releases.

elezar avatar Feb 27 '24 13:02 elezar

Further more, hope to expose the GPU usage for wise scheduling as well but seems NFD/GFD reports at intervals, 60s by default, not quite fit here. Any suggestions, what we do today is report via device plugin self.

I don't know whether labels are the right place to expose usage information. This sounds more like something that should be made available by DCGM or another component.

I would expect labels to be relatively static due to the impact they have on decisions such as placement and scheduling.

@kerthcet when you mention exposing the topology, how do you translate this to a label? Are labels intended to encode data this way?

elezar avatar Feb 27 '24 13:02 elezar

I don't know whether labels are the right place to expose usage information. This sounds more like something that should be made available by DCGM or another component.

Thanks for the advices, we're exploring reading the prometheus.

kerthcet avatar Feb 28 '24 02:02 kerthcet

when you mention exposing the topology, how do you translate this to a label? Are labels intended to encode data this way?

This is how it looks like in our system right now: [[-1, 20, 20, 20], [20, -1, 20, 20], [20, 20, -1, 20], [20, 20, 20, -1]], because we use the topo for scheduling, so digital number is enough for us for scoring, but for display usage, I guess it's a different thing, maybe same as truncated nvidia-smi topo which is familiar to users. We can have a transition function internally.

kerthcet avatar Feb 28 '24 02:02 kerthcet