bbolt
bbolt copied to clipboard
Is being dependent on the Check method enough for detecting boltdb corruption?
If I call the Check method when I read from boltdb the first time, is that enough to ensure boltdb is not corrupted? Or do I need to call it periodically within my process that is reading from boltdb? Is there anything else that needs to be done?
https://godoc.org/go.etcd.io/bbolt#Tx.Check
Tx.Check is diagnostic method that checks consistency in given point of time using some set of rules. So far it focuses on relationship between pages:
- Whether all pages references from the root are reachable
- Whether all unreachable pages are on the free-pages-list.
Using public API operations database should not get corrupted. But due to a bug, hardware issue, cosmic rays such corruption might happen. Usually you don't need to call Check from your business-logic application, but for example you might consider checking and alerting whether backups are in the consistent state.
BTW: https://github.com/etcd-io/bbolt/pull/225 is expanding the Checks to cover also a logical errors (unexpected key order).
Usually you don't need to call Check from your business-logic application
I am currently dealing with a corrupt database file in production, where iterating the keys in one bucket give wrong data, and reading another bucket never finishes and eats up all RAM until the process is killed.
I feel forced to do such a consistency check after opening the database to mitigate such issues.
@benma Is it possible for you to share the corrupt database?
@benma Is it possible for you to share the corrupt database?
I found two databases from around that time, but I don't know anymore if these are the ones that ate up all the RAM or if they had different issues such as panics:
- rates.db.zip
- https://github.com/etcd-io/bbolt/issues/105#issuecomment-1308502456
Pages in [3655, 3715] were somehow reset. All zero values in these pages.
FYI. https://github.com/etcd-io/bbolt/pull/520