bbolt icon indicating copy to clipboard operation
bbolt copied to clipboard

Is being dependent on the Check method enough for detecting boltdb corruption?

Open arjunsingri opened this issue 5 years ago • 6 comments

If I call the Check method when I read from boltdb the first time, is that enough to ensure boltdb is not corrupted? Or do I need to call it periodically within my process that is reading from boltdb? Is there anything else that needs to be done?

https://godoc.org/go.etcd.io/bbolt#Tx.Check

arjunsingri avatar Aug 30 '19 22:08 arjunsingri

Tx.Check is diagnostic method that checks consistency in given point of time using some set of rules. So far it focuses on relationship between pages:

  • Whether all pages references from the root are reachable
  • Whether all unreachable pages are on the free-pages-list.

Using public API operations database should not get corrupted. But due to a bug, hardware issue, cosmic rays such corruption might happen. Usually you don't need to call Check from your business-logic application, but for example you might consider checking and alerting whether backups are in the consistent state.

ptabor avatar Jun 26 '20 15:06 ptabor

BTW: https://github.com/etcd-io/bbolt/pull/225 is expanding the Checks to cover also a logical errors (unexpected key order).

ptabor avatar Jun 26 '20 16:06 ptabor

Usually you don't need to call Check from your business-logic application

I am currently dealing with a corrupt database file in production, where iterating the keys in one bucket give wrong data, and reading another bucket never finishes and eats up all RAM until the process is killed.

I feel forced to do such a consistency check after opening the database to mitigate such issues.

benma avatar Nov 07 '22 00:11 benma

@benma Is it possible for you to share the corrupt database?

cenkalti avatar May 15 '23 17:05 cenkalti

@benma Is it possible for you to share the corrupt database?

I found two databases from around that time, but I don't know anymore if these are the ones that ate up all the RAM or if they had different issues such as panics:

  1. rates.db.zip
  2. https://github.com/etcd-io/bbolt/issues/105#issuecomment-1308502456

benma avatar May 18 '23 21:05 benma

rates.db.zip

Pages in [3655, 3715] were somehow reset. All zero values in these pages.

FYI. https://github.com/etcd-io/bbolt/pull/520

ahrtr avatar May 31 '23 08:05 ahrtr