gloo
gloo copied to clipboard
Gloo pod crashes and/or never becomes live/ready under load
Gloo Edge Version
1.12.x (latest stable)
Kubernetes Version
No response
Describe the bug
When I have a large number of VSs (e.g. 3000) and associated Upstreams (3-1), AuthConfigs (1-1), and RateLimitConfigs (1-1) then the gloo pod will never become live/ready and eventually crashed. The gloo pod has significant resource limits (e.g. 8cpu/16Gi).
{"level":"info","ts":"2022-09-14T23:23:50.416Z","logger":"gloo-ee.v1.event_loop.setup.v1.event_loop","caller":"v1/eds_event_loop.sk.go:57","msg":"event loop started","version":"1.12.15"} {"level":"panic","ts":"2022-09-14T23:35:18.169Z","logger":"gloo-ee.v1.event_loop.setup","caller":"setup/setup_syncer.go:609","msg":"failed warming up endpoints - consider adjusting endpointsWarmingTimeout","version":"1.12.15","warmTimeoutDuration":300,"stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions\n\t/go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:609[ngithub.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1](http://ngithub.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1)\n\t/workspace/solo-projects/projects/gloo/pkg/setup/setup.go:68[ngithub.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup](http://ngithub.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup)\n\t/go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:395[ngithub.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync](http://ngithub.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync)\n\t/go/pkg/mod/github.com/solo-io/[email protected]/pkg/utils/setuputils/setup_syncer.go:60[ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run.func1](http://ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run.func1)\n\t/go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:84"} panic: failed warming up endpoints - consider adjusting endpointsWarmingTimeout goroutine 99 [running]: [go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc001632e40](http://go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc001632e40), {0xc006f4b200, 0x1, 0x2}) /go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:232 +0x68b [go.uber.org/zap.(*SugaredLogger).log(0xc00012c020](http://go.uber.org/zap.(*SugaredLogger).log(0xc00012c020), 0x4, {0x4c3e356, 0x48}, {0x0, 0x0, 0x0}, {0xc006eac8a8, 0x2, 0x2}) /go/pkg/mod/go.uber.org/[email protected]/sugar.go:227 +0x185 [go.uber.org/zap.(*SugaredLogger).Panicw(0xc00012c020](http://go.uber.org/zap.(*SugaredLogger).Panicw(0xc00012c020), {0x4c3e356, 0x48}, {0xc006eac8a8, 0x2, 0x2}) /go/pkg/mod/go.uber.org/[email protected]/sugar.go:204 +0x68 [github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions({{0xc000997060](http://github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions(%7B%7B0xc000997060), 0xb}, {0xc00007602e, 0xb}, {0x0, 0x0, 0x0}, {0x501e820, 0xc000962a50}, {0x505ece8, ...}, ...}, ...) /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:609 +0x1e0f [github.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1({{0xc000997060](http://github.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1(%7B%7B0xc000997060), 0xb}, {0xc00007602e, 0xb}, {0x0, 0x0, 0x0}, {0x501e820, 0xc000962a50}, {0x505ece8, ...}, ...}) /workspace/solo-projects/projects/gloo/pkg/setup/setup.go:68 +0x113 [github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup(0xc0004749a0](http://github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup(0xc0004749a0), {0x50576d0, 0xc000a97bf0}, {0x505c148, 0xc000480540}, {0x505c190, 0xc000759a40}, 0xc00077d680, {0x500d8e0, 0xc00029cc68}) /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:395 +0x1c7a [github.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync(0xc000987140](http://github.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync(0xc000987140), {0x50576d0, 0xc000a97bf0}, 0xc00097cfa8) /go/pkg/mod/github.com/solo-io/[email protected]/pkg/utils/setuputils/setup_syncer.go:60 +0x4b3 github.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run.func1() /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:84 +0x34c created by github.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:66 +0x55f
Steps to reproduce the bug
See above
Expected Behavior
Gloo does not crash and become live/ready in a reasonable time
Additional Context
No response
Heap dumps https://solo-io.slack.com/archives/GPG5SMLH1/p1663269505775109
Whats a reasonable time? Whats the current warming time by the way?
Whatever the default warming timeout is
current EDS warming default is 5min
closing as liveness probe has been updated to dummy, and performance optimizations have gone in (tracked elsewhere)