Deniss Abramovs
Deniss Abramovs
Nope, didn't help. I have updated it, pod was removed and still complaining about: ``` Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get...
I have removed all pods, to trigger everything from scratch.
``` (base) beck@beck:/$ kubectl get pods -n gpu-operator NAME READY STATUS RESTARTS AGE gpu-feature-discovery-8b8ls 0/1 Init:0/1 0 115s gpu-operator-59b9d49c6f-gkk4j 1/1 Running 0 2m20s nvidia-dcgm-exporter-6bmlt 0/1 Init:0/1 0 115s nvidia-device-plugin-daemonset-f7xgb 0/1...
here are error from systemd containerd logs: https://gist.github.com/denissabramovs/a77e97972b5aa01c86955d812d3e8188
Here is updated, latest one: https://gist.github.com/denissabramovs/2272051bb2f684f623cd15273ea6dd25
at least, now containerd is not constantly restarting, it is already up for 9 minutes: ``` ● containerd.service - containerd container runtime Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled) Active:...
All 3 systemd services are up and running on GPU node: ``` (base) beck@beck:/$ sudo systemctl status --no-pager kubelet containerd docker | grep active Active: active (running) since Wed 2022-11-02...
Sorry, missed your message. Here it is: ``` (base) beck@beck:/$ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy ```
``` (base) beck@beck:/$ containerd --version containerd containerd.io 1.6.9 1c90a442489720eec95342e1789ee8a5e1b9536f ```
`revision=9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 version=1.6.8`: ``` nov 02 19:20:49 beck containerd[202761]: time="2022-11-02T19:20:49.723337417+02:00" level=info msg="starting containerd" revision=9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 version=1.6.8 ... ... ... nov 02 19:22:34 beck containerd[202761]: time="2022-11-02T19:22:34.246953180+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:nvidia-device-plugin-daemonset-lbnzw,Uid:fd4f1d3f-29d2-4d11-a724-96f4ed107cd5,Namespace:gpu-operator,Attempt:0,} failed, error" error="failed...