New joined nodes has error: "run/containerd/containerd.sock: connect: connection refused"
Spegel version
v0.0.18
Kubernetes distribution
kubeadm
Kubernetes version
v1.30
CNI
calico
Describe the bug
we are running kubernetes cluster on baremental machines using capi. I found any new joined nodes after spegel installation will have the error below and not function for any mirror.
{"level":"info","ts":1719427282.4200914,"caller":"state/state.go:30","msg":"running scheduled image state update"}
{"level":"error","ts":1719427282.4204097,"caller":"state/state.go:32","msg":"received errors when updating all images","error":"connection error: desc = \"transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable","stacktrace":"github.com/xenitab/spegel/pkg/state.Track\n\t/build/pkg/state/state.go:32\nmain.registryCommand.func5\n\t/build/main.go:172\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78"}
looks like same as: https://github.com/spegel-org/spegel/issues/333, not sure if that related leader election refresh.
This has nothing to do with #333. The error you are seeing comes from the Containerd client not being able to communicate with the Containerd socket. Are you sure the socket is located at the path that is configured?
yes, I am sure the socket is located at the path. everytime after I restart spegel daemonset, the issue was fixed.
This seems like a peculiar issue as restarting the pod should have no effect. And you are seeing the same issue with the latest Spegel version?
still don't have a chance to test with the latest Spegel version, hopefully i can get it tested next week.
On Thu, Jul 11, 2024 at 2:25 PM Philip Laine @.***> wrote:
This seems like a peculiar issue as restarting the pod should have no effect. And you are seeing the same issue with the latest Spegel version?
— Reply to this email directly, view it on GitHub https://github.com/spegel-org/spegel/issues/528#issuecomment-2223964893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK25LWOH6UA23HEESMFO7PLZL3Z3VAVCNFSM6AAAAABKA4CVZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHE3DIOBZGM . You are receiving this because you authored the thread.Message ID: @.***>
We have enabled selinux on our nodes and had the same error. The solution was the following setting:
securityContext:
seLinuxOptions:
type: spc_t
I have the same issue on control-plane nodes, regardless by restart
{"time":"2024-08-16T10:56:12.574167036Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/state.Track","file":"/build/pkg/state/state.go","line":36},"msg":"received errors when updating all images","err":"connection error: desc = \"transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable"}
The error that you are seeing means that either the Containerd socket does not exist at that path or it can't be reached. This check is run immediately on start and will exit Spegel if an error occurs as there would be no use continuing. Are you sure that this is the correct path?
Could not produce this issue anymore, please ignore
Closing as issues seem to be resolved.