retina icon indicating copy to clipboard operation
retina copied to clipboard

retina-agent panics when running locally in a Kind cluster

Open shashankram opened this issue 1 year ago • 5 comments

make helm-install-advanced-local-context

Logs:

ts=2024-03-21T20:58:50.234Z level=panic caller=controllermanager/controllermanager.go:118 msg="Error running controller manager" goversion=go1.21.8 os=linux arch=amd64 numcores=16 hostname=backstage-worker podname=retina-agent-88dzr version=v0.0.1 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser error="failed to start plugin manager, plugin exited: failed to start plugin packetparser: interface eth0 of type device not found" errorVerbose="interface eth0 of type device not found\nfailed to start plugin packetparser\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nfailed to start plugin manager, plugin exited\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:186\ngithub.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:108\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
panic: Error running controller manager [recovered]
	panic: Error running controller manager

goroutine 138 [running]:
github.com/microsoft/retina/pkg/telemetry.TrackPanic()
	/go/src/github.com/microsoft/retina/pkg/telemetry/telemetry.go:112 +0x209
panic({0x242fc60?, 0xc003192120?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x0?, {0x0?, 0x0?, 0xc00318e020?})
	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0031941a0, {0xc003190380, 0x1, 0x1})
	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:262 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc000493640?, {0x2b48afa?, 0x0?}, {0xc003190380, 0x1, 0x1})
	/go/pkg/mod/go.uber.org/[email protected]/logger.go:284 +0x51
github.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start(0xc000d01cc0, {0x2f057d0?, 0xc000836320?})
	/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:118 +0x28c
created by main.main in goroutine 1
	/go/src/github.com/microsoft/retina/controller/main.go:286 +0x2825

shashankram avatar Mar 21 '24 21:03 shashankram

make helm-install-advanced-local-context

Logs:

ts=2024-03-21T20:58:50.234Z level=panic caller=controllermanager/controllermanager.go:118 msg="Error running controller manager" goversion=go1.21.8 os=linux arch=amd64 numcores=16 hostname=backstage-worker podname=retina-agent-88dzr version=v0.0.1 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser error="failed to start plugin manager, plugin exited: failed to start plugin packetparser: interface eth0 of type device not found" errorVerbose="interface eth0 of type device not found\nfailed to start plugin packetparser\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nfailed to start plugin manager, plugin exited\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:186\ngithub.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:108\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
panic: Error running controller manager [recovered]
	panic: Error running controller manager

goroutine 138 [running]:
github.com/microsoft/retina/pkg/telemetry.TrackPanic()
	/go/src/github.com/microsoft/retina/pkg/telemetry/telemetry.go:112 +0x209
panic({0x242fc60?, 0xc003192120?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x0?, {0x0?, 0x0?, 0xc00318e020?})
	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0031941a0, {0xc003190380, 0x1, 0x1})
	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:262 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc000493640?, {0x2b48afa?, 0x0?}, {0xc003190380, 0x1, 0x1})
	/go/pkg/mod/go.uber.org/[email protected]/logger.go:284 +0x51
github.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start(0xc000d01cc0, {0x2f057d0?, 0xc000836320?})
	/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:118 +0x28c
created by main.main in goroutine 1
	/go/src/github.com/microsoft/retina/controller/main.go:286 +0x2825

Hi @shashankram ,

Adding some details about the error - the Packetparser plugin expects a node to have the eth0 device to be present and tries to attach tc programs to it.

Two observations:

  1. Packetparser shouldn't panic if eth0 is not present, it should warn and move on. We should definitely fix this in code. Will track the fix using this issue.
  2. Even with that fix, Retina may not work as expected on your Kind setup. The plugins are dependent on the underlying host kernel. For example, if the Kind cluster is running on docker installed on a Windows machine, Retina won't run successfully.

anubhabMajumdar avatar Mar 21 '24 23:03 anubhabMajumdar

For anyone working on this fix, error handling to be done here - https://github.com/microsoft/retina/blob/0b8a44caf1fa073cca19649e493b0a66d5416822/pkg/plugin/packetparser/packetparser_linux.go#L215C3-L215C13

anubhabMajumdar avatar Mar 21 '24 23:03 anubhabMajumdar

@aman952036

suneet-patil avatar Mar 22 '24 17:03 suneet-patil

Have same issue trying install v0.0.4 on my k8s-cluster

Trying installing retina using this refrence on k8s version v1.28.6 but retina-agent always Crashloopbackoff

Cluster

image

Step install

VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
    --version $VERSION \
    --namespace kube-system \
    --set image.tag=$VERSION \
    --set operator.tag=$VERSION \
    --set image.pullPolicy=Always \
    --set logLevel=info \
    --set os.windows=false \ # set to false 
    --set operator.enabled=true \
    --set operator.enableRetinaEndpoint=true \
    --set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\,packetparser\]" \
    --set enablePodLevel=true \
    --set enableAnnotations=true

After install

image

Evidence Logs

Checking logs using --previous

ts=2024-04-01T12:54:09.716Z level=error caller=pluginmanager/pluginmanager.go:185 msg="plugin manager exited with error" goversion=go1.21.8 os=linux arch=amd64 numcores=4 hostname=nb-k8s-controlplane-1 podname=retina-agent-25kqb version=v0.0.4 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser error="failed to start plugin packetparser: interface eth0 of type device not found" errorVerbose="interface eth0 of type device not found\nfailed to start plugin packetparser\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
ts=2024-04-01T12:54:09.716Z level=info caller=server/server.go:79 msg="gracefully shutting down HTTP server..." goversion=go1.21.8 os=linux arch=amd64 numcores=4 hostname=nb-k8s-controlplane-1 podname=retina-agent-25kqb version=v0.0.4 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser
ts=2024-04-01T12:54:09.716Z level=info caller=watchermanager/watchermanager.go:71 msg="watcher stopping..." goversion=go1.21.8 os=linux arch=amd64 numcores=4 hostname=nb-k8s-controlplane-1 podname=retina-agent-25kqb version=v0.0.4 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser watcher_type=*endpoint.EndpointWatcher
ts=2024-04-01T12:54:09.716Z level=info caller=server/server.go:71 msg="HTTP server stopped with err: http: Server closed" goversion=go1.21.8 os=linux arch=amd64 numcores=4 hostname=nb-k8s-controlplane-1 podname=retina-agent-25kqb version=v0.0.4 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser
ts=2024-04-01T12:54:09.716Z level=info caller=watchermanager/watchermanager.go:71 msg="watcher stopping..." goversion=go1.21.8 os=linux arch=amd64 numcores=4 hostname=nb-k8s-controlplane-1 podname=retina-agent-25kqb version=v0.0.4 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser watcher_type=*apiserver.ApiServerWatcher
ts=2024-04-01T12:54:09.716Z level=panic caller=controllermanager/controllermanager.go:119 msg="Error running controller manager" goversion=go1.21.8 os=linux arch=amd64 numcores=4 hostname=nb-k8s-controlplane-1 podname=retina-agent-25kqb version=v0.0.4 apiserver=https://10.96.0.1:443 plugins=dropreason,packetforward,linuxutil,dns,packetparser error="failed to start plugin manager, plugin exited: failed to start plugin packetparser: interface eth0 of type device not found" errorVerbose="interface eth0 of type device not found\nfailed to start plugin packetparser\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nfailed to start plugin manager, plugin exited\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:186\ngithub.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start.func1\n\t/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:109\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
panic: Error running controller manager

goroutine 46 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x0?, {0x0?, 0x0?, 0xc003baa120?})
	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc003bc00d0, {0xc003bb46c0, 0x1, 0x1})
	/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:262 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc000c4d400?, {0x2b52d30?, 0x0?}, {0xc003bb46c0, 0x1, 0x1})
	/go/pkg/mod/go.uber.org/[email protected]/logger.go:284 +0x51
github.com/microsoft/retina/pkg/managers/controllermanager.(*Controller).Start(0xc0008dcfa0, {0x2f10a90?, 0xc000c4a870?})
	/go/src/github.com/microsoft/retina/pkg/managers/controllermanager/controllermanager.go:119 +0x28c
created by main.main in goroutine 1
	/go/src/github.com/microsoft/retina/controller/main.go:290 +0x28d0

ngurah-bagus-trisna avatar Apr 01 '24 12:04 ngurah-bagus-trisna

same issue

weizhoublue avatar Apr 08 '24 06:04 weizhoublue