dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

Failed to watch metrics: Error watching fields: The third-party Profiling module returned an u

Open 287400117 opened this issue 1 year ago • 2 comments

What is the version?

3.3.5-3.4.1

What happened?

dcgm-exporter may encounter errors during the first startup, but the issue can be resolved by automatically restarting the service. The error message is as follows: image

What did you expect to happen?

rt

What is the GPU model?

No response

What is the environment?

No response

How did you deploy the dcgm-exporter and what is the configuration?

No response

How to reproduce the issue?

No response

Anything else we need to know?

No response

287400117 avatar May 23 '24 09:05 287400117

There might be a race condition causing this behavior. Please attach the /var/log/nv-hostengine.log file from the container or enable DCGM logging with the following parameters:

--enable-dcgm-log --dcgm-log-level DEBUG

glowkey avatar May 24 '24 14:05 glowkey

There might be a race condition causing this behavior. Please attach the /var/log/nv-hostengine.log file from the container or enable DCGM logging with the following parameters:

--enable-dcgm-log --dcgm-log-level DEBUG

"Okay, I'll give it a try."

287400117 avatar May 30 '24 02:05 287400117