gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Gloo pod crashes and/or never becomes live/ready under load

Open bdecoste opened this issue 3 years ago • 4 comments

Gloo Edge Version

1.12.x (latest stable)

Kubernetes Version

No response

Describe the bug

When I have a large number of VSs (e.g. 3000) and associated Upstreams (3-1), AuthConfigs (1-1), and RateLimitConfigs (1-1) then the gloo pod will never become live/ready and eventually crashed. The gloo pod has significant resource limits (e.g. 8cpu/16Gi).

{"level":"info","ts":"2022-09-14T23:23:50.416Z","logger":"gloo-ee.v1.event_loop.setup.v1.event_loop","caller":"v1/eds_event_loop.sk.go:57","msg":"event loop started","version":"1.12.15"} {"level":"panic","ts":"2022-09-14T23:35:18.169Z","logger":"gloo-ee.v1.event_loop.setup","caller":"setup/setup_syncer.go:609","msg":"failed warming up endpoints - consider adjusting endpointsWarmingTimeout","version":"1.12.15","warmTimeoutDuration":300,"stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions\n\t/go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:609[ngithub.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1](http://ngithub.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1)\n\t/workspace/solo-projects/projects/gloo/pkg/setup/setup.go:68[ngithub.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup](http://ngithub.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup)\n\t/go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:395[ngithub.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync](http://ngithub.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync)\n\t/go/pkg/mod/github.com/solo-io/[email protected]/pkg/utils/setuputils/setup_syncer.go:60[ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run.func1](http://ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run.func1)\n\t/go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:84"} panic: failed warming up endpoints - consider adjusting endpointsWarmingTimeout goroutine 99 [running]: [go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc001632e40](http://go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc001632e40), {0xc006f4b200, 0x1, 0x2}) /go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:232 +0x68b [go.uber.org/zap.(*SugaredLogger).log(0xc00012c020](http://go.uber.org/zap.(*SugaredLogger).log(0xc00012c020), 0x4, {0x4c3e356, 0x48}, {0x0, 0x0, 0x0}, {0xc006eac8a8, 0x2, 0x2}) /go/pkg/mod/go.uber.org/[email protected]/sugar.go:227 +0x185 [go.uber.org/zap.(*SugaredLogger).Panicw(0xc00012c020](http://go.uber.org/zap.(*SugaredLogger).Panicw(0xc00012c020), {0x4c3e356, 0x48}, {0xc006eac8a8, 0x2, 0x2}) /go/pkg/mod/go.uber.org/[email protected]/sugar.go:204 +0x68 [github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions({{0xc000997060](http://github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions(%7B%7B0xc000997060), 0xb}, {0xc00007602e, 0xb}, {0x0, 0x0, 0x0}, {0x501e820, 0xc000962a50}, {0x505ece8, ...}, ...}, ...) /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:609 +0x1e0f [github.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1({{0xc000997060](http://github.com/solo-io/solo-projects/projects/gloo/pkg/setup.NewSetupFuncWithRestControlPlaneAndExtensions.func1(%7B%7B0xc000997060), 0xb}, {0xc00007602e, 0xb}, {0x0, 0x0, 0x0}, {0x501e820, 0xc000962a50}, {0x505ece8, ...}, ...}) /workspace/solo-projects/projects/gloo/pkg/setup/setup.go:68 +0x113 [github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup(0xc0004749a0](http://github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.(*setupSyncer).Setup(0xc0004749a0), {0x50576d0, 0xc000a97bf0}, {0x505c148, 0xc000480540}, {0x505c190, 0xc000759a40}, 0xc00077d680, {0x500d8e0, 0xc00029cc68}) /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/syncer/setup/setup_syncer.go:395 +0x1c7a [github.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync(0xc000987140](http://github.com/solo-io/gloo/pkg/utils/setuputils.(*SetupSyncer).Sync(0xc000987140), {0x50576d0, 0xc000a97bf0}, 0xc00097cfa8) /go/pkg/mod/github.com/solo-io/[email protected]/pkg/utils/setuputils/setup_syncer.go:60 +0x4b3 github.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run.func1() /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:84 +0x34c created by github.com/solo-io/gloo/projects/gloo/pkg/api/v1.(*setupEventLoop).Run /go/pkg/mod/github.com/solo-io/[email protected]/projects/gloo/pkg/api/v1/setup_event_loop.sk.go:66 +0x55f

Steps to reproduce the bug

See above

Expected Behavior

Gloo does not crash and become live/ready in a reasonable time

Additional Context

No response

bdecoste avatar Sep 16 '22 17:09 bdecoste

Heap dumps https://solo-io.slack.com/archives/GPG5SMLH1/p1663269505775109

chrisgaun avatar Sep 16 '22 17:09 chrisgaun

Whats a reasonable time? Whats the current warming time by the way?

nfuden avatar Sep 16 '22 17:09 nfuden

Whatever the default warming timeout is

bdecoste avatar Sep 16 '22 18:09 bdecoste

current EDS warming default is 5min

kdorosh avatar Sep 21 '22 19:09 kdorosh

closing as liveness probe has been updated to dummy, and performance optimizations have gone in (tracked elsewhere)

kdorosh avatar Sep 30 '22 14:09 kdorosh