retina-agent pod initialization failed to reconcile plugin dropreason: field NfConntrackConfirm: program nf_conntrack_confirm
Describe the bug installation commands: helm-install-with-operator
retina-agent pod status as follows:
kubectl get pods -n kube-system |grep retina-agent
retina-agent-7q7ls 0/1 CrashLoopBackOff 72 (2m59s ago) 5h49m
retina-agent-9m272 0/1 CrashLoopBackOff 72 (2m58s ago) 5h49m
retina-agent-nd2qg 0/1 CrashLoopBackOff 72 (95s ago) 5h49m
retina-agent-wg44m 0/1 CrashLoopBackOff 72 (2m5s ago) 5h49m
containers errlogs is:
retina ts=2024-04-09T03:47:28.299Z level=debug caller=loader/compile.go:22 msg=Running goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser command="/bin/clang -target bpf -Wall -D__TARGET_ARCH_x86 -g -O2 -c /go/src/github.com/microsoft/retina/pkg/plugin/dropreason/_cprog/drop_reason.c -o /go/src/github.com/microsoft/retina/pkg/plugin/dropreason/kprobe_bpf.o -I/go/src/github.com/microsoft/retina/pkg/plugin/dropreason/../lib/_amd64 -I/go/src/github.com/microsoft/retina/pkg/plugin/dropreason/../lib/common/libbpf/_src -I/go/src/github.com/microsoft/retina/pkg/plugin/dropreason/../filter/_cprog/"
retina ts=2024-04-09T03:47:29.030Z level=debug caller=loader/compile.go:29 msg="Output running command" goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser command="/bin/clang -target bpf -Wall -D__TARGET_ARCH_x86 -g -O2 -c /go/src/github.com/microsoft/retina/pkg/plugin/dropreason/_cprog/drop_reason.c -o /go/src/github.com/microsoft/retina/pkg/plugin/dropreason/kprobe_bpf.o -I/go/src/github.com/microsoft/retina/pkg/plugin/dropreason/../lib/_amd64 -I/go/src/github.com/microsoft/retina/pkg/plugin/dropreason/../lib/common/libbpf/_src -I/go/src/github.com/microsoft/retina/pkg/plugin/dropreason/../filter/_cprog/" stdout=
retina ts=2024-04-09T03:47:29.030Z level=info caller=dropreason/dropreason_linux.go:120 msg="DropReason metric compiled" goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser
retina ts=2024-04-09T03:47:29.333Z level=error caller=dropreason/dropreason_linux.go:155 msg="Error loading objects: %w" goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser error="field NfConntrackConfirm: program nf_conntrack_confirm: apply CO-RE relocations: load kernel module spec: open /sys/kernel/btf/nf_conntrack: no such file or directory"
retina ts=2024-04-09T03:47:29.333Z level=info caller=server/server.go:79 msg="gracefully shutting down HTTP server..." goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser
retina ts=2024-04-09T03:47:29.333Z level=info caller=server/server.go:71 msg="HTTP server stopped with err: http: Server closed" goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser
retina ts=2024-04-09T03:47:29.333Z level=panic caller=controllermanager/controllermanager.go:119 msg="Error running controller manager" goversion=go1.21.9 os=linux arch=amd64 numcores=8 hostname=cce-bpf-master podname=retina-agent-wg44m version=ef779b6 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser error="failed to reconcile plugin dropreason: field NfConntrackConfirm: program nf_conntrack_confirm: apply CO-RE relocations: load kernel module spec: open /sys/kernel/btf/nf_conntrack: no such file or directory" errorVerbose="field NfConntrackConfirm: program nf_conntrack_confirm: apply CO-RE relocations: load kernel module spec: open /sys/kernel/btf/nf_conntrack: no such file or directory\nfailed to reconcile plugin dropreason\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:169\ngithub.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:109\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
retina panic: Error running controller manager
Expected behavior retina-agent pod status is normal.
OS: CentOS Linux 8.2 (Core) Kernel Version: 5.4.273-1.el8.elrepo.x86_64 Kubernetes Version: v1.26.12 Host: local kubernets Retina Version: v0.0.5 images Tag: ef779b6-linux-amd64
Additional context Host /sys/kernel/btf/ directory
tree /sys/kernel/btf/
/sys/kernel/btf/
└── vmlinux
0 directories, 1 file
the exception reported error is the lack of necessary dependencies, which necessary dependencies need to be installed. I don't know much about bpf, btf related technologies, I hope I can get help to solve this problem!
It's possible that kernel 5.4 does not have the features we need. @vakalapa do we have any idea what our kernel backcompat is?
Shouldnt we be checking kernal version check at the retina init and update docs too?
I recently encountered the same problem, the v0.0.2 version ran normally, but v0.0.5 had this problem. I looked at the source code and found no relevant code changes, does anyone know what caused it?
@wenhuwang which distro and kernel version?
@wenhuwang which distro and kernel version?
@rbtr Env OS: Ubuntu 18.04.5 LTS Kernel Version: 5.10.87-051087-generic Kubernetes Version: 1.22.2
After a long period of troubleshooting, i found that this issues was caused by the cilium/ebpf package upgrade.
The verification steps is as follows: The same problem occurs when i build the image using the main branch and then run it. When I lower the cilium/ebpf package version to v0.13.2, the built image can run normally.
PR #1300 for the cilium/ebpf package related to this issue
@wenhuwang I don't understand why this change which added CO-RE (which is supposed to improve kernel compatibility) would cause this issue. I wonder if it may be fixed with the changes in the latest cilium/ebpf
@rbtr I guess that this commit caused the change, and the error location is loadKernelModuleSpec function. This commit will determine the kernel module based on the ebpf program type and attach point, and then find the btf file corresponding to the kernel module. However, some lower version kernels only have vmlinux file.
The ebpf program in the dropreason plugin needs to be mounted to the nf_conntrack kernel module, but there is no nf_conntrack file in the /sys/kernel/btf directory of my node.
This commit seems to fix the issues
This commit seems to fix the issues
Great, I'm queueing #300 so that we have that fix in our next release. Thanks for investigating this issue!
This commit seems to fix the issues
Great, I'm queueing #300 so that we have that fix in our next release. Thanks for investigating this issue!
This commit seems to fix the issues
Great, I'm queueing #300 so that we have that fix in our next release. Thanks for investigating this issue!
Thank you to all Retina developers. Retina version 0.0.9 is currently running normally!
retina ts=2024-04-30T06:16:00.902Z level=debug caller=linuxutil/ethtool_stats_linux.go:81 msg="Processed ethtool Stats " goversion=go1.22.2 os=linux arch=amd64 numcores=4 hostname=cce-bpf-slave3 podname=retina-agent-72gg6 version=v0.0.9 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser ifacename=enp4s3
retina ts=2024-04-30T06:16:00.902Z level=error caller=linuxutil/ethtool_stats_linux.go:73 msg="Error while getting ethtool:" goversion=go1.22.2 os=linux arch=amd64 numcores=4 hostname=cce-bpf-slave3 podname=retina-agent-72gg6 version=v0.0.9 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser ifacename=kube-ipvs0 error="operation not supported"
retina ts=2024-04-30T06:16:00.902Z level=error caller=linuxutil/ethtool_stats_linux.go:73 msg="Error while getting ethtool:" goversion=go1.22.2 os=linux arch=amd64 numcores=4 hostname=cce-bpf-slave3 podname=retina-agent-72gg6 version=v0.0.9 apiserver=https://10.68.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser ifacename=tunl0 error="operation not supported"
Another question, the error information of these operation not supported ifacenames can be ignored without attention
Thanks @einnse for letting us know this is fixed 🙂
That error can probably be ignored. #296 is open re customizing interfaces to skip in ethtool