gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

MicroK8s containerd-template.toml is wrong when docker is installed in parallel

Open s-bernhardt opened this issue 9 months ago • 2 comments

Hi, I have the following situation where the toolkit breaks the MicroK8s containerd config file.

Ubuntu 22.04 LTS Nvidia Driver version: 550 MicroK8s version: 1.31 docker version: 24.0.4 gpu-operator helm chart version: v24.9.2

helm values:

driver:
  enabled: "false"
operator:
  defaultRuntime: containerd
toolkit:
  enabled: "true"
  env:
  - name: CONTAINERD_CONFIG
    value: /var/snap/microk8s/current/args/containerd-template.toml
  - name: CONTAINERD_SOCKET
    value: /var/snap/microk8s/common/run/containerd.sock
  - name: CONTAINERD_SET_AS_DEFAULT
    value: "1"

The config file is valid and the service starts fine when I remove the following lines:

disabled_plugins = ["cri"]
imports = ["/etc/containerd/config.toml"]
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""

so the file just starts as usual with:

oom_score = 0
version = 2

the toolkit works fine, when docker is uninstalled during the pod start. as soon as docker is available with its own containerd, the toolkit is somehow detecting it and writing a bad config.

any hint how I can prevent that? I need docker at the same host, because it is my dev environment.

regards Stefan

s-bernhardt avatar Mar 27 '25 12:03 s-bernhardt

Likely related: https://github.com/NVIDIA/nvidia-container-toolkit/issues/982

cdesiniotis avatar Mar 28 '25 19:03 cdesiniotis

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

github-actions[bot] avatar Nov 04 '25 22:11 github-actions[bot]