`nvidia.com/gpu.memory` capacity
Hey,
I have a customer ... using nvidia GPU operator, alongside some custom controller (fabric8), that reads the nvidia.com/gpu.memory label added by gpu feature discovery, then patching node objects adding some nvidia.com/gpu.memory entry in node capacity/allocatable resources.
I was surprised to see this is not managed by nvidia operator OOB.
Setting this, our clusters end-users are then able to schedule pods without requesting GPU cores explicitly - thus, a single GPU core may be used by more than one containers.
Any plan to implement something similar?
I don't think I can share my customer's code, and I'm not sure java code would help here ... For the record, while the following adds a label to nodes ( https://github.com/NVIDIA/gpu-feature-discovery/blob/main/internal/lm/resource.go#L36-L73 ), we might be able to patch the corresponding Node's status.capacity, adding or patching an entry for resource named nvidia.com/gpu.memory.
Thanks!
/cc @klueska
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
This issue was automatically closed due to inactivity.