starclump
starclump
Hi nvvfedorov! Thanks for the suggestion! As we installed 470.103.01 for a long time in the production environment, it's a bit difficult for us to update the GPU driver. We...
Really appreciate your feedback! Here's the information I collect, please help review. DCGM-exporter image version: nvidia/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04. DCGM version: nvidia/dcgm:3.3.5-1-ubuntu22.04. GPU Server: A30 * 4. GPU Driver version: 470.103.01 (For cluster...
`The complete YAML required for reproduction is as follows: ```yaml apiVersion: apps.kruise.io/v1alpha1 kind: DaemonSet metadata: name: "dcgm-exporter" namespace: "kube-system" labels: app.kubernetes.io/name: "dcgm-exporter" spec: updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 15% selector:...
After we updated the Driver version to 535.129.03, we could get data of 'DCGM_FI_DEV_CLOCK_REASONS', but still fail to get DCGM_EXP_CLOCK_EVENTS_COUNT