rootlesskit icon indicating copy to clipboard operation
rootlesskit copied to clipboard

Terminates under EKS with msg="killing slirp4netns"

Open dg424 opened this issue 3 years ago • 2 comments

We're using Rootless DinD running on EKS worker nodes. We're intermittently getting the following failure:

main.main.func2
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:213
github.com/urfave/cli/v2.(*App).RunContext
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:322
github.com/urfave/cli/v2.(*App).Run
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:222
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571
time="2022-10-03T21:27:19Z" level=debug msg="killing slirp4netns"
time="2022-10-03T21:27:19Z" level=debug msg="killed slirp4netns: signal: killed"
[rootlesskit:parent] error: exit status 1
child exited
github.com/rootless-containers/rootlesskit/pkg/parent.Parent
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/pkg/parent/parent.go:275
main.main.func2
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:220
github.com/urfave/cli/v2.(*App).RunContext
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:322
github.com/urfave/cli/v2.(*App).Run
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:222
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571

Note: We currently have a startupProbe (https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes) to verify that the DinD container is up with a time limit of 30 seconds (which seems more than enough).

Any idea or instructions to debug this further ?

dg424 avatar Oct 04 '22 13:10 dg424

a time limit of 30 seconds (which seems more than enough).

Does the error occur if you increase the limit?

child exited

Do you see any log from the rootless dind daemon?

AkihiroSuda avatar Oct 05 '22 04:10 AkihiroSuda

Hi @AkihiroSuda Yes, it is still occuring after raising the limit from 5 seconds to 30 seconds, although less than it was with 5 seconds. I think we might have to increase the timeout to 1 minute now. But not sure if it will still continue to happen. The logs I get from k8s after terminating the pod is what I pasted. Is there anything else I can do to find out the root cause here ?

dg424 avatar Oct 05 '22 13:10 dg424