dcgm-exporter
dcgm-exporter copied to clipboard
Could not enable kubernetes metric collection: nvml: Unknown Error
What is the version?
3.1.8-3.1.5
What happened?
When DCGM_REMOTE_HOSTENGINE_INFO is configured in dcgm-exporter, occasional errors may occur after the dcgm-exporter Pod is rebuilt, but the issue can be resolved by restarting the container using docker restart. The error message is as follows:
What did you expect to happen?
rt
What is the GPU model?
No response
What is the environment?
No response
How did you deploy the dcgm-exporter and what is the configuration?
No response
How to reproduce the issue?
No response
Anything else we need to know?
No response
@287400117 , Try to use the latest version of the dcgm-exporter.
@287400117 , Try to use the latest version of the dcgm-exporter.
#330 There will be another error in the latest version.