dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

Running dcgm exporter without root privileges

Open thekuffs opened this issue 1 year ago • 2 comments

Hi,

I'm working on deploying dcgm exporter to several clusters that I operate. I noticed that the DaemonSet requests root privileges and I would rather that it didn't (https://github.com/NVIDIA/dcgm-exporter/blob/main/dcgm-exporter.yaml#L47).

I realize that this tool requires access to the nvidia device nodes and drivers in order to be able to function. But this functionality is already available in the nvidia container toolkit (https://github.com/NVIDIA/nvidia-docker).

When a pod requests gpu resources it gets scheduled to a gpu node. And when a pod starts up on that node that requires those resources, the nvidia container runtime copies in the appropriate device nodes and binaries. In this way pods can be built and scheduled to appropriate nodes without requiring runtime driver builds or strict driver version mapping.

I do not know enough about DCGM to be absolutely positive about this, but I think that it could use those same tools in order to avoid building modules at runtime and thus skip the requirement to run as root.

The only thing holding it back at the moment is a limitation of the nvidia container runtime. As far as I know, there is no way to request access to gpus on a node running nvidia container toolkit without 'consuming' that resource. If there were perhaps a way to tell the container toolkit that a pod needed access to all the gpus without consuming the resource for other users, then one could use that to ensure that DCGM exporter lands on those nodes. And that it would have the device nodes created and libraries available at runtime without needing to build them.

thekuffs avatar Mar 10 '23 18:03 thekuffs

I'm also interested in being able to run as nonroot and would greatly appreciate any progress towards a resolution.

alexglenn-ddl avatar Mar 04 '24 11:03 alexglenn-ddl

I agree. It would be great to have this capability.

sh-TU avatar Mar 20 '24 07:03 sh-TU