`dockershim.sock` symlink should be relative
Image I'm using:
Bottlerocket OS 1.20.2 (aws-k8s-1.29)
What I expected to happen:
I expected /run/dockershim.sock to be a valid socket.
What actually happened:
In the Datadog Agent Pod, they mount the host filesystem under /host. They then expect to be able to connect to the Docker daemon via /host/run/dockershim.sock. Unfortunately, /run/dockershim.sock is an absolute link to /run/containerd/containerd.sock (See #2173), which is broken in the mounted file system.
Proposed Solution:
Make /run/dockershim.sock a relative link to ./containerd/containerd.sock instead of an absolute link.
Note that /var/run/dockershim.sock is already a relative link: ./containerd/containerd.sock
How to reproduce the problem:
Deploy Datadog Helm chart 3.66.0 to EKS running Bottlerocket and configure according to Datadog docs with
criSocketPath: /run/dockershim.sock
View logs from DaemonSet datadog Pod, container agent, and see
CORE | ERROR | (pkg/util/containerd/containerd_util.go:109 in NewContainerdUtil) | Containerd init error: temporary failure in containerdutil, will retry later: failed to dial "/host/run/dockershim.sock": context deadline exceeded
Alternately, use kubectl exec into the agent container to run file /host/run/dockershim.sock and see the error:
/host/run/dockershim.sock: broken symbolic link to /run/containerd/containerd.sock
Thanks for cutting this @Nuru. Do you know if this worked in a previous version of the helm chart? I noticed that they made a recent change https://github.com/DataDog/helm-charts/issues/1352 but probably didn't impact this. Nonetheless, I think making this link relative should work. I'll give this a shot to see if it helps and report back!
Do you know if this worked in a previous version of the helm chart?
This setting is not in the Datadog Helm chart, it is in their documentation. The relevant part of their Helm chart has not changed in 3 years.
I was able to try out a change that does fix the symlink issue. I don't have a working Datadog setup to confirm that this fully fixes it but I can confirm the link works now:
# file /host/run/dockershim.sock
/host/run/dockershim.sock: symbolic link to ./containerd/containerd.sock
And the nodes with this relative link don't have the error message:
CORE | ERROR | (pkg/util/containerd/containerd_util.go:109 in NewContainerdUtil) | Containerd init error: temporary failure in containerdutil, will retry later: failed to dial "/host/run/dockershim.sock": context deadline exceeded
I'll get a PR cut shortly with this proposed fix.
https://github.com/bottlerocket-os/bottlerocket-core-kit/pull/18 Should hopefully fix this issue when released!