rootlesskit Terminates under EKS with msg="killing slirp4netns"

We're using Rootless DinD running on EKS worker nodes. We're intermittently getting the following failure:

main.main.func2
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:213
github.com/urfave/cli/v2.(*App).RunContext
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:322
github.com/urfave/cli/v2.(*App).Run
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:222
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571
time="2022-10-03T21:27:19Z" level=debug msg="killing slirp4netns"
time="2022-10-03T21:27:19Z" level=debug msg="killed slirp4netns: signal: killed"
[rootlesskit:parent] error: exit status 1
child exited
github.com/rootless-containers/rootlesskit/pkg/parent.Parent
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/pkg/parent/parent.go:275
main.main.func2
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:220
github.com/urfave/cli/v2.(*App).RunContext
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:322
github.com/urfave/cli/v2.(*App).Run
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main
	/tmp/tmp.ccni3BnQLU/pkg/mod/github.com/rootless-containers/[email protected]/cmd/rootlesskit/main.go:222
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1571

Note: We currently have a startupProbe (https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes) to verify that the DinD container is up with a time limit of 30 seconds (which seems more than enough).

Any idea or instructions to debug this further ?

Oct 04 '22 13:10 dg424

a time limit of 30 seconds (which seems more than enough).

Does the error occur if you increase the limit?

child exited

Do you see any log from the rootless dind daemon?

Oct 05 '22 04:10 AkihiroSuda

Hi @AkihiroSuda Yes, it is still occuring after raising the limit from 5 seconds to 30 seconds, although less than it was with 5 seconds. I think we might have to increase the timeout to 1 minute now. But not sure if it will still continue to happen. The logs I get from k8s after terminating the pod is what I pasted. Is there anything else I can do to find out the root cause here ?

Oct 05 '22 13:10 dg424