spegel icon indicating copy to clipboard operation
spegel copied to clipboard

New joined nodes has error: "run/containerd/containerd.sock: connect: connection refused"

Open sonyafenge opened this issue 1 year ago • 9 comments

Spegel version

v0.0.18

Kubernetes distribution

kubeadm

Kubernetes version

v1.30

CNI

calico

Describe the bug

we are running kubernetes cluster on baremental machines using capi. I found any new joined nodes after spegel installation will have the error below and not function for any mirror.

{"level":"info","ts":1719427282.4200914,"caller":"state/state.go:30","msg":"running scheduled image state update"}
{"level":"error","ts":1719427282.4204097,"caller":"state/state.go:32","msg":"received errors when updating all images","error":"connection error: desc = \"transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable","stacktrace":"github.com/xenitab/spegel/pkg/state.Track\n\t/build/pkg/state/state.go:32\nmain.registryCommand.func5\n\t/build/main.go:172\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78"}

sonyafenge avatar Jun 28 '24 00:06 sonyafenge

looks like same as: https://github.com/spegel-org/spegel/issues/333, not sure if that related leader election refresh.

sonyafenge avatar Jun 28 '24 02:06 sonyafenge

This has nothing to do with #333. The error you are seeing comes from the Containerd client not being able to communicate with the Containerd socket. Are you sure the socket is located at the path that is configured?

phillebaba avatar Jul 01 '24 20:07 phillebaba

yes, I am sure the socket is located at the path. everytime after I restart spegel daemonset, the issue was fixed.

sonyafenge avatar Jul 08 '24 21:07 sonyafenge

This seems like a peculiar issue as restarting the pod should have no effect. And you are seeing the same issue with the latest Spegel version?

phillebaba avatar Jul 11 '24 21:07 phillebaba

still don't have a chance to test with the latest Spegel version, hopefully i can get it tested next week.

On Thu, Jul 11, 2024 at 2:25 PM Philip Laine @.***> wrote:

This seems like a peculiar issue as restarting the pod should have no effect. And you are seeing the same issue with the latest Spegel version?

— Reply to this email directly, view it on GitHub https://github.com/spegel-org/spegel/issues/528#issuecomment-2223964893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK25LWOH6UA23HEESMFO7PLZL3Z3VAVCNFSM6AAAAABKA4CVZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHE3DIOBZGM . You are receiving this because you authored the thread.Message ID: @.***>

sonyafenge avatar Jul 12 '24 03:07 sonyafenge

We have enabled selinux on our nodes and had the same error. The solution was the following setting:

securityContext:
  seLinuxOptions:
    type: spc_t

freym avatar Jul 17 '24 05:07 freym

I have the same issue on control-plane nodes, regardless by restart

{"time":"2024-08-16T10:56:12.574167036Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/state.Track","file":"/build/pkg/state/state.go","line":36},"msg":"received errors when updating all images","err":"connection error: desc = \"transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable"}

jurim76 avatar Aug 16 '24 10:08 jurim76

The error that you are seeing means that either the Containerd socket does not exist at that path or it can't be reached. This check is run immediately on start and will exit Spegel if an error occurs as there would be no use continuing. Are you sure that this is the correct path?

phillebaba avatar Aug 29 '24 07:08 phillebaba

Could not produce this issue anymore, please ignore

jurim76 avatar Aug 29 '24 07:08 jurim76

Closing as issues seem to be resolved.

phillebaba avatar Oct 07 '24 07:10 phillebaba