tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

tetragon pods CrashLoopBackOff

Open 3rgfbrgf opened this issue 2 years ago • 0 comments

What happened?

Tetragon is deployed with ArgoCD on multiple clusters. On a few of them all pods start normally, then stop with the CrashLoopBackOff status. There are no differences in k8s, cilium, tetragon, OS and Linux kernel versions between problematic clusters and the normal ones.

Tetragon Version

1.0.2

Kernel Version

5.4.0-165-generic

Kubernetes Version

v1.27.5

Bugtool

cannot provide due to privacy reasons

Relevant log output

On some pods there is an error about grpc

time="2024-02-22T08:41:32Z" level=info msg="Starting gRPC server" address="localhost:54321" protocol=tcp
time="2024-02-22T08:41:32Z" level=error msg="Failed to close gRPC server" error="grpc: the server has been stopped"

On some pods there is an unclear trace message about Reflector ListAndWatch

time="2024-02-22T08:53:26Z" level=info msg="Loaded config from directory" config-dir=/etc/tetragon/tetragon.conf.d/
time="2024-02-22T08:53:26Z" level=info msg="Starting gops server" addr="localhost:8118"
time="2024-02-22T08:53:26Z" level=info msg="Starting tetragon" version=v1.0.2
time="2024-02-22T08:53:26Z" level=info msg="config settings" config="map[bpf-lib:/var/lib/tetragon/ btf: config-dir:/etc/tetragon/tetragon.conf.d/ cpuprofile: data-cache-size:1024 debug:false disable-kprobe-multi:false enable-export-aggregation:false enable-k8s-api:true enable-msg-handling-latency:false enable-pid-set-filter:false enable-pod-info:false enable-policy-filter:true enable-policy-filter-debug:false enable-process-ancestors:true enable-process-cred:false enable-process-ns:false event-queue-size:10000 export-aggregation-buffer-size:10000 export-aggregation-window-size:15s export-allowlist:{\"event_set\":[\"PROCESS_EXEC\", \"PROCESS_EXIT\", \"PROCESS_KPROBE\", \"PROCESS_UPROBE\", \"PROCESS_TRACEPOINT\"]} export-denylist:{\"health_check\":true}\n{\"namespace\":[\"\", \"cilium\", \"kube-system\"]} export-file-compress:false export-file-max-backups:5 export-file-max-size-mb:10 export-file-perm:600 export-file-rotation-interval:0s export-filename:/var/run/cilium/tetragon/tetragon.log export-rate-limit:-1 expose-kernel-addresses:false field-filters:{} force-large-progs:false force-small-progs:false gops-address:localhost:8118 k8s-kubeconfig-path: kernel: kmods:[] log-format:text log-level:info memprofile: metrics-server::2112 netns-dir:/var/run/docker/netns/ pprof-addr: process-cache-size:65536 procfs:/procRoot rb-queue-size:65535 rb-size:0 rb-size-total:0 release-pinned-bpf:true server-address:localhost:54321 tracing-policy: tracing-policy-dir:/etc/tetragon/tetragon.tp.d verbose:0]"
time="2024-02-22T08:53:38Z" level=info msg="BPF detected features: override_return: true, buildid: false, kprobe_multi: false, fmodret: false"
time="2024-02-22T08:53:38Z" level=info msg="BPF: successfully released pinned BPF programs and maps" bpf-dir=/sys/fs/bpf/tetragon
time="2024-02-22T08:53:38Z" level=info msg="Enabling policy filtering"
time="2024-02-22T08:53:39Z" level=info msg="BTF discovery: default kernel btf file found" btf-file=/sys/kernel/btf/vmlinux
time="2024-02-22T08:53:39Z" level=info msg="sensor controller waiting on channel"
time="2024-02-22T08:53:39Z" level=info msg="Starting metrics server" addr=":2112"
time="2024-02-22T08:53:39Z" level=info msg="Registering pod delete handler for metrics"
time="2024-02-22T08:53:39Z" level=info msg="Cgroup mode detection succeeded" cgroup.fs=/sys/fs/cgroup cgroup.mode="Legacy mode (Cgroupv1)"
time="2024-02-22T08:53:39Z" level=info msg="Supported cgroup controller 'memory' is active on the system" cgroup.controller.hierarchyID=5 cgroup.controller.index=4 cgroup.controller.name=memory cgroup.fs=/sys/fs/cgroup
time="2024-02-22T08:53:39Z" level=info msg="Supported cgroup controller 'pids' is active on the system" cgroup.controller.hierarchyID=4 cgroup.controller.index=11 cgroup.controller.name=pids cgroup.fs=/sys/fs/cgroup
time="2024-02-22T08:53:39Z" level=info msg="Supported cgroup controller 'cpuset' is active on the system" cgroup.controller.hierarchyID=7 cgroup.controller.index=0 cgroup.controller.name=cpuset cgroup.fs=/sys/fs/cgroup
time="2024-02-22T08:53:39Z" level=info msg="Cgroupv1 controller 'memory' will be used" cgroup.controller.hierarchyID=5 cgroup.controller.index=4 cgroup.controller.name=memory cgroup.fs=/sys/fs/cgroup
time="2024-02-22T08:53:39Z" level=info msg="Cgroupv1 hierarchy validated successfully" cgroup.fs=/sys/fs/cgroup cgroup.path=/sys/fs/cgroup/memory
time="2024-02-22T08:53:39Z" level=info msg="Deployment mode detection succeeded" cgroup.fs=/sys/fs/cgroup deployment.mode=Kubernetes
time="2024-02-22T08:53:40Z" level=info msg="Updated TetragonConf map successfully" NSPID=1 cgroup.controller.hierarchyID=5 cgroup.controller.index=4 cgroup.controller.name=memory cgroup.fs.magic=Cgroupv1 confmap-update=tg_conf_map deployment.mode=Kubernetes log.level=info
time="2024-02-22T08:53:40Z" level=info msg="Enabling Kubernetes API"
time="2024-02-22T08:53:40Z" level=info msg="Waiting for required CRDs" crds="map[tracingpolicies.cilium.io:{} tracingpoliciesnamespaced.cilium.io:{}]"
time="2024-02-22T08:53:52Z" level=info msg="Received signal terminated, shutting down..."
I0222 08:53:53.877670       1 trace.go:236] Trace[122234908]: "Reflector ListAndWatch" name:github.com/cilium/tetragon/cmd/tetragon/main.go:376 (22-Feb-2024 08:53:40.377) (total time: 13500ms):
Trace[122234908]: ---"Objects listed" error:<nil> 13499ms (08:53:53.877)
Trace[122234908]: [13.500144714s] [13.500144714s] END
time="2024-02-22T08:53:53Z" level=info msg="Found CRD" crd=tracingpolicies.cilium.io
time="2024-02-22T08:53:53Z" level=info msg="Found CRD" crd=tracingpoliciesnamespaced.cilium.io
time="2024-02-22T08:53:53Z" level=info msg="Found all the required CRDs"
time="2024-02-22T08:53:53Z" level=info msg="registering policyfilter pod handlers"
time="2024-02-22T08:53:54Z" level=info msg="adding /procRoot/1/root/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice to cgroup pod directories"
time="2024-02-22T08:53:54Z" level=info msg="adding /procRoot/1/root/sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice to cgroup pod directories"


### Anything else?

_No response_

3rgfbrgf avatar Feb 22 '24 09:02 3rgfbrgf