Retina Agent Pod Experiences CrashLoopBackOff After Installation on EKS
1. Description: I attempted to install Retina following the official installation guide available at Retina Installation Setup. I executed the commands as per the instructions on the page. However, the retina-agent pod entered a CrashLoopBackOff state, and the logs indicated a panic error related to the controller manager.
2. Steps to Reproduce:
Navigate to the Retina installation documentation page: https://retina.sh/docs/installation/setup Run the following command to set the version and install Retina via Helm:
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version $VERSION \
--namespace kube-system \
--set image.tag=$VERSION \
--set operator.tag=$VERSION \
--set logLevel=info \
--set enabledPlugin_linux="[dropreason,packetforward,linuxutil,dns]"
3. Observe that the retina-agent pod enters a CrashLoopBackOff state.
Expected Behavior: The Retina agent should install smoothly without errors and the pods should be running stably.
Actual Behavior: The retina-agent pod fails to start and enters a CrashLoopBackOff loop. The logs display the following panic error: panic: Error running controller manager
goroutine 148 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x0?, {0x0?, 0x0?, 0xc001432020?})
/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00143c000, {0xc0016e2980, 0x1, 0x1})
/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:262 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc000998700?, {0x2b52d44?, 0x0?}, {0xc0016e2980, 0x1, 0x1})
/go/pkg/mod/go.uber.org/[email protected]/logger.go:284 +0x51
github.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start(0xc000b0f220, {0x2f10ad0?, 0xc000b0f1d0?})
/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:119 +0x28c
created by main.main in goroutine 1
/go/src/github.com/microsoft/retina/controller/main.go:290 +0x28d0
Platform
- OS: Amazon Linux 2
- Kubernetes Version: v1.25.9-eks-0a21954
- Host: EKS
- Retina Version: v0.0.5
Possible duplicate of #246, @jaeeyoungkim can you post more of the log? the actual error is logged before the panic. What is your kernel version and CPU architecture?
cpu architecture: amd64 kernel_version : 5.10.179-168.710.amzn2.x86_64
Command to check the logs and the full output of the logs:
kubectl logs -n kube-system pod/retina-agent-ztvms -c retina
ts=2024-04-16T01:09:27.857Z level=info caller=controller/main.go:103 msg="starting Retina version: v0.0.4" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms
ts=2024-04-16T01:09:27.857Z level=info caller=controller/main.go:104 msg="Reading config ..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms
ts=2024-04-16T01:09:27.857Z level=info caller=controller/main.go:106 msg="Initializing metrics" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms
ts=2024-04-16T01:09:27.857Z level=info caller=metrics/metrics.go:151 msg="Metrics initialized" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms
ts=2024-04-16T01:09:27.857Z level=info caller=controller/main.go:109 msg="Initializing Kubernetes client-go ..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms
ts=2024-04-16T01:09:27.857Z level=info caller=controller/main.go:133 msg="telemetry disabled" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.064Z level=info caller=controller/main.go:207 msg="Kubernetes server version: v1.25.16-eks-b9c9ed7" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.064Z level=info caller=pluginmanager/pluginmanager.go:70 msg="plugin manager has pod level disabled" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.064Z level=info caller=controllermanager/controllermanager.go:79 msg="Initializing controller manager ..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.064Z level=info caller=servermanager/servermanager.go:33 msg="Initializing HTTP server ..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.064Z level=info caller=server/server.go:42 msg="Setting up handlers" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=server/server.go:57 msg="Completed handler setup" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=servermanager/servermanager.go:37 msg="HTTP server initialized..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=controller/main.go:291 msg="Started controller manager" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=servermanager/servermanager.go:42 msg="Starting HTTP server ..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns host=0.0.0.0 port=10093
ts=2024-04-16T01:09:28.065Z level=info caller=server/server.go:69 msg="starting HTTP server... on " goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns addr=0.0.0.0:10093
ts=2024-04-16T01:09:28.065Z level=info caller=pluginmanager/pluginmanager.go:138 msg="Starting plugin manager ..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=linuxutil/linuxutil_linux.go:42 msg="Initializing linuxutil plugin..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns name=linuxutil
ts=2024-04-16T01:09:28.065Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin linuxutil" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:28.065Z level=info caller=linuxutil/linuxutil_linux.go:57 msg="Running linuxutil plugin..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:30.565Z level=info caller=dns/dns_linux.go:90 msg="Stopped dns plugin" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:31.169Z level=info caller=dns/dns_linux.go:64 msg="Initialized dns plugin" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:31.169Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns name=dns
ts=2024-04-16T01:09:31.169Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin dns" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:34.767Z level=info caller=dropreason/dropreason_linux.go:120 msg="DropReason metric compiled" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:35.274Z level=error caller=dropreason/dropreason_linux.go:155 msg="Error loading objects: %w" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns error="field NfConntrackConfirm: program nf_conntrack_confirm: apply CO-RE relocations: load kernel module spec: open /sys/kernel/btf/nf_conntrack: no such file or directory"
ts=2024-04-16T01:09:35.274Z level=info caller=linuxutil/linuxutil_linux.go:64 msg="Context is done, linuxutil will stop running" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:35.274Z level=info caller=server/server.go:79 msg="gracefully shutting down HTTP server..." goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:35.274Z level=info caller=server/server.go:71 msg="HTTP server stopped with err: http: Server closed" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns
ts=2024-04-16T01:09:35.274Z level=panic caller=controllermanager/controllermanager.go:119 msg="Error running controller manager" goversion=go1.21.8 os=linux arch=amd64 numcores=2 hostname=ip-10-21-110-120.ap-northeast-2.compute.internal podname=retina-agent-ztvms version=v0.0.4 apiserver=https://172.20.0.1:443 plugins=dropreason,packetforward,linuxutil,dns error="failed to reconcile plugin dropreason: field NfConntrackConfirm: program nf_conntrack_confirm: apply CO-RE relocations: load kernel module spec: open /sys/kernel/btf/nf_conntrack: no such file or directory" errorVerbose="field NfConntrackConfirm: program nf_conntrack_confirm: apply CO-RE relocations: load kernel module spec: open /sys/kernel/btf/nf_conntrack: no such file or directory\nfailed to reconcile plugin dropreason\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:169\ngithub.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:109\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
panic: Error running controller manager
goroutine 139 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0xc000037df4?, {0x0?, 0x0?, 0xc0016ed740?})
/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00178e4e0, {0xc001782c00, 0x1, 0x1})
/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:262 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc0007d5ec0?, {0x2b52d30?, 0x8fbec0?}, {0xc001782c00, 0x1, 0x1})
/go/pkg/mod/go.uber.org/[email protected]/logger.go:284 +0x51
github.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start(0xc000aaa910, {0x2f10a90?, 0xc0009eda90?})
/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:119 +0x28c
created by main.main in goroutine 1
/go/src/github.com/microsoft/retina/controller/main.go:290 +0x28d0
I got the same issue.
amd64, 5.10.205-195.804.amzn2.x86_64.
It works when I disabled dropreason plugin. Most likely it's the same issue with this
JFYI @rbtr
@jaeeyoungkim @tungdam1337 we think that the dropreason plugin requires kernel 5.11+. But we merged a bpf lib update and just released v0.0.9, can you confirm if you still have this issue with the v0.0.9 release?
Yeah v0.0.9 works. Thanks
Just an update here, we think that after releasing v0.0.9, the minimum supported kernel version is now 5.4. Closing this, as it appears that the original issue is resolved.