gpu-operator issues

Bump github.com/regclient/regclient from 0.9.2 to 0.11.1

1

Bumps [github.com/regclient/regclient](https://github.com/regclient/regclient) from 0.9.2 to 0.11.1. Release notes Sourced from github.com/regclient/regclient's releases. v0.11.1 Release v0.11.1 Security: Go 1.25.5 fixes CVE-2025-61729 (PR 1025) Go 1.25.5 fixes CVE-2025-61727 (PR 1025) Fixes: Correct...

dependabot[bot]

dependencies

Bump golang from 1.25.4 to 1.25.5 in /docker

3

Bumps golang from 1.25.4 to 1.25.5. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang&package-manager=docker&previous-version=1.25.4&new-version=1.25.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a...

dependabot[bot]

dependencies

docker

[Feature Request] Add hostNetwork mode for dcgmExporter

5

Hello, NVIDIA Team. I'm facing an issue while configurating `dcgm-exporter` from `gpu-operator`. I have 2 Kubernetes clusters - one is a cluster where GPU jobs run, and the other is...

jslouisyou

good-first-issue

Add weekly forward compatibility testing

2

Implement automated forward compatibility tests that validate GPU Operator against the latest published images from NVIDIA component repositories. Changes: - Add forward-compatibility.yaml workflow (weekly + manual trigger) - Create get-latest-images.sh...

ArangoGutierrez

latest gpu operator container toolkit daemonset behavior catastrophically breaks clusters running k0s

7

**Describe the bug** https://github.com/k0sproject/k0s/issues/6547 The two step import you introduced, `/etc/k0s/containerd.d/nvidia.toml -> /etc/containerd/conf.d/99-nvidia.toml ` breaks k0s clusters. **To Reproduce** Use gpu-operator on a k0s cluster. **Expected behavior** Don't be too...

doctorpangloss

bug

needs-triage

Add cluster upgrade support by including MOFED dependency in RDMA scenarios

1

This commit adds the proper GPU driver wait for the MOFED driver to be ready so RDMA APIs are available when driver is recompiled. This ensures the operator supports cluster...

tginer

chore(docker): optimize Dockerfile and reduce image size

3

### Title: chore(docker): optimize Dockerfile and reduce image size ### Description: This PR improves the NVIDIA GPU Operator Dockerfile by: * Reducing the image size by cleaning DNF caches and...

rauldsl

Driver init fails in air-gapped clusters due to hard-coded mount of Red Hat subscription repo config

4

### Summary When deploying GPU Operator in an **air-gapped** (offline) cluster the `nvidia-driver-daemonset` init container fails to start. Root cause: the driver image ships with a **public YUM repo** enabled...

changhyuni

bug

gpu-operator error causes pods on time-sliced H100 node to restart intermittently

6

An operator error occurs roughly once a day on our H100 on which time-slicing is enabled on the `mig-1g.10gb` instances. This causes the other pods to restart as seen below...

vinkamath

bug

Add VFIO Validation for vgpu-device-manager

1

JunAr7112

gpu-operator
gpu-operator copied to clipboard

Metadata

Bump github.com/regclient/regclient from 0.9.2 to 0.11.1

Bump golang from 1.25.4 to 1.25.5 in /docker

[Feature Request] Add hostNetwork mode for dcgmExporter

Add weekly forward compatibility testing

latest gpu operator container toolkit daemonset behavior catastrophically breaks clusters running k0s

Add cluster upgrade support by including MOFED dependency in RDMA scenarios

chore(docker): optimize Dockerfile and reduce image size

Driver init fails in air-gapped clusters due to hard-coded mount of Red Hat subscription repo config

gpu-operator error causes pods on time-sliced H100 node to restart intermittently

Add VFIO Validation for vgpu-device-manager

← Metadata

Owner

Metadata

gpu-operator gpu-operator copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpu-operator
gpu-operator copied to clipboard