Burrow
Burrow copied to clipboard
Burrow does not exit after panic
Version: 1.6.0 Issue: burrow hangs (stops responding, but does not exit) after a failure to unlock from zk:
2023-10-02 17:33:19.940 | {"level":"info","ts":1696257198.8336904,"msg":"re-submitting `0` credentials after reconnect","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:19.940 | {"level":"info","ts":1696257198.8336573,"msg":"authenticated: id=74567085257124526, timeout=6000","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:19.940 | {"level":"info","ts":1696257198.811102,"msg":"starting session","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:19.940 | {"level":"info","ts":1696257198.8110363,"msg":"Connected to [zk-ip1]:2181","type":"coordinator","name":"zookeeper"}
2023-10-02 17:33:18.938 | stderr /home/runner/work/Burrow/Burrow/core/internal/notifier/coordinator.go:272 +0x1f1
2023-10-02 17:33:18.938 | stderr created by github.com/linkedin/Burrow/core/internal/notifier.(*Coordinator).Start
2023-10-02 17:33:18.938 | stderr /home/runner/work/Burrow/Burrow/core/internal/notifier/coordinator.go:328 +0x505
2023-10-02 17:33:18.938 | stderr github.com/linkedin/Burrow/core/internal/notifier.(*Coordinator).manageEvalLoop(0xc0000f0380)
2023-10-02 17:33:18.934 | stderr goroutine 115 [running]:
2023-10-02 17:33:18.934 | stderr
2023-10-02 17:33:18.934 | stderr panic: Unable to release zookeeper lock after session expiration
Seems like that panic was somehow recovered, because Burrow failed at was not printed. And the process did not died until 10 minutes later when I send it a SIGTERM.
A similar thing happens if I start it locally, without access to zk:
{"level":"panic","ts":1696264692.487353,"msg":"Failure to start zookeeper","type":"coordinator","name":"zookeeper","error":"lookup zk-host on [zk-ip]:53: no such host"}
panic: Failure to start zookeeper [recovered]
panic: Failure to start zookeeper
goroutine 1 [running]:
main.handleExit()
/home/runner/work/Burrow/Burrow/main.go:63 +0xf8
panic({0xbeb8a0, 0xc0003a60b0})
/opt/hostedtoolcache/go/1.20.1/x64/src/runtime/panic.go:884 +0x213
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x7f680a4c45e8?, {0x0?, 0x0?, 0xc000132020?})
/home/runner/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:198 +0x65
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00011c000, {0xc000226180, 0x1, 0x1})
/home/runner/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:264 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc000226000?, {0xd227e4?, 0x0?}, {0xc000226180, 0x1, 0x1})
/home/runner/go/pkg/mod/go.uber.org/[email protected]/logger.go:258 +0x59
github.com/linkedin/Burrow/core/internal/zookeeper.(*Coordinator).Start(0xc00014a240)
/home/runner/work/Burrow/Burrow/core/internal/zookeeper/coordinator.go:87 +0x42b
github.com/linkedin/Burrow/core.Start(0xc000084540?, 0xc0001a5ef0?)
/home/runner/work/Burrow/Burrow/core/burrow.go:158 +0x49b
main.main()
/home/runner/work/Burrow/Burrow/main.go:114 +0x4d2
And no logs since that, the prcess is alive. This time the panic clearly has been recovered. I would very much like burrow to exit on network problems, so my orchestration could restart it.