bbolt icon indicating copy to clipboard operation
bbolt copied to clipboard

Panic happens when opening a boltdb

Open aa624545345 opened this issue 11 months ago • 2 comments

I'm not sure whether this is an issue, but it's confusing. In my scenario, when containerd service was running, the host powered off. Next time host started, then containerd started, panic happened when containerd was trying to open the meta.db.

panic: assertion failed: Page expected to be: 3393, but self identifies as 6868074383292906018

goroutine 31 [running]:
go.etcd.io/bbolt._assert(...)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:1359
go.etcd.io/bbolt.(*page).fastCheck(0x7f8a70d41000, 0xd41)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/page.go:57 +0x1df
go.etcd.io/bbolt.(*Tx).page(0xc000372fa0?, 0x5625f8aea50b?)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/tx.go:534 +0x8a
go.etcd.io/bbolt.(*Tx).forEachPageInternal(0x0?, {0xc000120460?, 0x1, 0xa}, 0xc000373078)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/tx.go:546 +0x65
go.etcd.io/bbolt.(*Tx).forEachPage(...)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/tx.go:542
go.etcd.io/bbolt.(*Tx).checkBucket(0xc00027a0e0, 0xc00027a0f8, 0xc000373240, 0xc000373210, {0x5625fa6f2ba0?, 0x5625fb4ab008}, 0xc000064b40)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/tx_check.go:83 +0x126
go.etcd.io/bbolt.(*DB).freepages(0x5625f9de40bb?)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:1181 +0x229
go.etcd.io/bbolt.(*DB).loadFreelist.func1()
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:412 +0xc5
sync.(*Once).doSlow(0xc000164888?, 0x10?)
	/usr/local/go/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
	/usr/local/go/src/sync/once.go:65
go.etcd.io/bbolt.(*DB).loadFreelist(0xc0001646c0?)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:408 +0x47
go.etcd.io/bbolt.Open({0xc000026f80, 0x31}, 0xf9de3628?, 0xc0004a1628)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:290 +0x40c
github.com/containerd/containerd/metadata/plugin.init.1.func1(0xc000128380)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/metadata/plugin/plugin.go:159 +0x78c
github.com/containerd/containerd/plugin.(*Registration).Init(0xc0000d2a80, 0xc000128380)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/plugin/plugin.go:127 +0x2e
github.com/containerd/containerd/services/server.New({0x5625fa6fc9e0, 0xc000666230}, 0xc000614700)
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/services/server/server.go:236 +0x13f8
github.com/containerd/containerd/cmd/containerd/command.App.func1.1()
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/cmd/containerd/command/main.go:194 +0x87
created by github.com/containerd/containerd/cmd/containerd/command.App.func1
	/root/rpmbuild/BUILD/containerd.io-1.7.6/_build/src/github.com/containerd/containerd/cmd/containerd/command/main.go:191 +0x8ac

I can provide the meta.db if possible.

aa624545345 avatar Mar 06 '24 08:03 aa624545345

REF https://github.com/containerd/containerd/issues/9929#issuecomment-1978318937

fuweid avatar Mar 06 '24 08:03 fuweid

REF containerd/containerd#9929 (comment)

The root page id (3393) is pointing to part of leaf page.

=============================================== Page ID: 3391 Page Type: leaf Total Size: 12288 bytes Overflow pages: 2 Item Count: 3

It looks like there is a potential bug in freelist management, e.g allocate allocates used pages, and bbolt crashes or terminates for whatever reason right after writing the such data pages, but before updating the meta pages.

ahrtr avatar Jun 20 '24 14:06 ahrtr