merkleeyes icon indicating copy to clipboard operation
merkleeyes copied to clipboard

Current levelDB panics on crash recovery

Open aphyr opened this issue 7 years ago • 0 comments

If a Merkleeyes LevelDB file is truncated (e.g. due to power failure or backup-and-restore), Merkleeyes can panic on startup, throwing:

no existing db, creating new db
loading existing db
error reading MerkleEyesState
panic: EOF

goroutine 1 [running]:
panic(0x8fab00, 0xc420606670)
    /home/balloo/go/src/runtime/panic.go:500 +0x1a1
github.com/tendermint/merkleeyes/app.NewMerkleEyesApp(0x7ffe977e5d61, 0x6, 0x0, 0x2)
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/app/app.go:93 +0x815
github.com/tendermint/merkleeyes/cmd.StartServer(0xc941e0, 0xc420058dc0, 0x0, 0x4)
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/cmd/app.go:33 +0x49
github.com/tendermint/merkleeyes/vendor/github.com/spf13/cobra.(*Command).execute(0xc941e0, 0xc420058d40, 0x4, 0x4, 0xc941e0, 0xc420058d40)
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/vendor/github.com/spf13/cobra/command.go:660 +0x44c
github.com/tendermint/merkleeyes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc94840, 0xc42000c0b8, 0x0, 0xc4201adf18)
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/vendor/github.com/spf13/cobra/command.go:735 +0x367
github.com/tendermint/merkleeyes/vendor/github.com/spf13/cobra.(*Command).Execute(0xc94840, 0x0, 0x0)
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/vendor/github.com/spf13/cobra/command.go:694 +0x2b
github.com/tendermint/merkleeyes/cmd.Execute()
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/cmd/root.go:25 +0x31
main.main()
    /home/balloo/goApps/src/github.com/tendermint/merkleeyes/cmd/merkleeyes/main.go:8 +0x14
loading existing db
error reading MerkleEyesState
panic: EOF

This effectively means power failures etc. on more than 1/3 of a cluster have a chance to render the cluster unusable, at least until you can fix the leveldb recovery code or restore from non-corrupt backups.

I can't see tmlibs, so I'm not exactly sure how this dependency works, but from our conversation in channel, I think Merkleeyes uses goleveldb for its on-disk storage. There are a few other reports of issues like this: Prometheus hit crash-recovery problems in Fall 2016, Syncthing hit panics around the same time, and there's also a report of disk imaging resulting in "corrupted or incomplete meta file" errors in Spring 2017. GolevelDB's maintainer suggests in those threads that https://github.com/syndtr/goleveldb/commit/1996ac2d281f2ba6171499ac0de852e41e4d446e and https://github.com/syndtr/goleveldb/commit/69e19a4743fd46eff0dc32c368d8b11b8adac35c may help, so it might be worth upgrading or cherry-picking those commits into Merkleeyes' LevelDB as well.

I also suggest developing a test suite to verify specifically whether Merkleeyes recovers correctly from arbitrary truncations of its various LevelDB files.

aphyr avatar Aug 12 '17 00:08 aphyr

Nothing is currently exposed to support this. I believe it's possible on android by grabbing the textview instance in the snackbar but I'd have to double check. I think we grab the instance to support multi-line snackbar on android. So it's possible to do similar for font-size if you'd like to help with a PR there I can answer any Qs.

On iOS I'd have to look at the cocoapod and see what it exposes currently, if nothing then we'd have to do similar approach and find the UIView to apply changes to it or modify the pod to support this directly.

bradmartin avatar Oct 25 '19 17:10 bradmartin