cilium-cli icon indicating copy to clipboard operation
cilium-cli copied to clipboard

Use ephemeral containers during sysdump if Cilium is stuck in crashloop

Open jrajahalme opened this issue 4 years ago • 3 comments

Currently bugtool info for Cilium agent is missing from sysdump for Cilium agents in crashloop. A lot of helpful information (e.g., open sockets, iptables, etc) could be collected also from nodes where Cilium agent fails to start. Would it be possible to run a job in the node with a bugtool/bpftool image to collect the current node state in cases when cilium pod fails to start?

jrajahalme avatar Jul 15 '21 09:07 jrajahalme

Currently bugtool info for Cilium agent is missing from sysdump for Cilium agents in crashloop. A lot of helpful information (e.g., open sockets, iptables, etc) could be collected also from nodes where Cilium agent fails to start. Would it be possible to run a job in the node with a bugtool/bpftool image to collect the current node state in cases when cilium pod fails to start?

Yes it is possible if a) the cluster supports ephemeral containers: https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/ or b) run a Deployment in the node(s) that are selected by a specific label, or even all nodes, that runs the bugtool in those nodes.

aanm avatar Jul 19 '21 01:07 aanm

^ Good idea. I've updated the issue to reflect this feature request and transferring it to the CLI repo as that's where sysdump lives.

christarazi avatar Jul 26 '23 20:07 christarazi