gpu-operator
gpu-operator copied to clipboard
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Problem Statement We're trying to setup the vGPU for VM workload on the OpenShift cluster but at present we're seeing `nvidia-sandbox-validator` pods in init state. Infrastructure details - Bare metal...
### HOST INFORMATION 1. OS and Architecture: Ubuntu 22.04, amd64 3. Kubernetes Distribution: K3s, K3d, RKE2 4. Kubernetes Version: v1.30.4 5. Host Node GPUs: NVIDIA RTX 4090 and 4070 ###...
Hello, We are currently running operator version: 24.6.2 The driver version we are trying to run is 550-5.15.0-1078-azure. However, the `nvidia-driver init` script step is failing for the daemonset nvidia-driver-daemonset...
Hello We are trying to configure GPU operator v24.9.2 in an RKE2 v1.31.4+rke2r1. The validator POD fails trying to validate toolkit with the below error: ``` Warning Failed 3m29s (x4...
## Summary This PR improves security by restricting the GPU Operator's ClusterRole permissions to only the specific ClusterRoles and ClusterRoleBindings it manages, following the principle of least privilege. ## Problem...
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.43.0 to 0.45.0. Commits 4e0068c go.mod: update golang.org/x dependencies e79546e ssh: curb GSSAPI DoS risk by limiting number of specified OIDs f91f7a7 ssh/agent: prevent panic on malformed...
This PR also bumps the regctl version to `v0.10.0`
Bumps [sigs.k8s.io/kustomize/kustomize/v5](https://github.com/kubernetes-sigs/kustomize) from 5.7.0 to 5.8.0. Release notes Sourced from sigs.k8s.io/kustomize/kustomize/v5's releases. kustomize/v5.8.0 Highlights implements to replacements value in the structured data Now, We can edit yaml/json in yaml manifests...
Bumps [sigs.k8s.io/controller-tools](https://github.com/kubernetes-sigs/controller-tools) from 0.18.0 to 0.19.0. Release notes Sourced from sigs.k8s.io/controller-tools's releases. v0.19.0 What's Changed ⚠️ Bump to k8s.io/* v0.34 by @alvaroaleman @dongjiang1989 @sbueringer in kubernetes-sigs/controller-tools#1225 kubernetes-sigs/controller-tools#1236 kubernetes-sigs/controller-tools#1258 kubernetes-sigs/controller-tools#1266 🐛...
Bumps [github.com/urfave/cli/v3](https://github.com/urfave/cli) from 3.5.0 to 3.6.1. Release notes Sourced from github.com/urfave/cli/v3's releases. v3.6.1 What's Changed chore(deps): bump golangci/golangci-lint-action from 8 to 9 by @dependabot[bot] in urfave/cli#2222 feat: add ability to...