bbolt icon indicating copy to clipboard operation
bbolt copied to clipboard

panic: invalid freelist page: 0, page type is unknown<00>

Open gandarez opened this issue 2 years ago • 14 comments

I've been using bbolt (already updated to latest version v1.3.7)since two years ago and started getting some weird panic when opening database file. I can't debug it neither get the db file to test it out since I distribute my application as a standalone client. Why it panics and do not return an error? Does that error happens because there's a corrupted db?

func (f *freelist) read(p *page) {
	if (p.flags & freelistPageFlag) == 0 {
		panic(fmt.Sprintf("invalid freelist page: %d, page type is %s", p.id, p.typ()))
	}
....
}

https://github.com/etcd-io/bbolt/blob/da2f2a53f6e2f25b215b79db2cd417488ef8e955/freelist.go#L265

https://github.com/wakatime/wakatime-cli/issues/848

gandarez avatar Mar 30 '23 23:03 gandarez

Looks like the db file is corrupted. To skip the error, @gandarez could try passing PreLoadFreelist: false but it is always loaded in RW mode. Can this restriction be removed? https://github.com/etcd-io/bbolt/blob/3e560dbae20dcb078d50f928ef7d17f1a56a4413/db.go#L182-L183

cenkalti avatar Mar 30 '23 23:03 cenkalti

Thanks @gandarez for raising this issue and sorry for the inconvenience. Copied the call stack from https://github.com/wakatime/wakatime-cli/issues/848 below.

The error message indicates that the meta page 0 might be corrupted (but the checksum is somehow correct). Is is possible to provide the db file? ( I saw your message neither get the db file, but still want to double confirm).

Do you have a detailed step to reproduce this issue?

goroutine 1 [running]:
runtime/debug.Stack()
 /opt/hostedtoolcache/go/1.19.6/x64/src/runtime/debug/stack.go:24 +0x65
github.com/wakatime/wakatime-cli/cmd.runCmd.func1()
 /home/runner/work/wakatime-cli/wakatime-cli/cmd/run.go:272 +0xd3
panic({0x9a5540, 0xc00060b980})
 /opt/hostedtoolcache/go/1.19.6/x64/src/runtime/panic.go:884 +0x212
go.etcd.io/bbolt.(*freelist).read(0x0?, 0x11bfa0c2000)
 /home/runner/go/pkg/mod/go.etcd.io/[email protected]/freelist.go:267 +0x22e
go.etcd.io/bbolt.(*DB).loadFreelist.func1()
 /home/runner/go/pkg/mod/go.etcd.io/[email protected]/db.go:415 +0xb8
sync.(*Once).doSlow(0xc000123608?, 0x10?)
 /opt/hostedtoolcache/go/1.19.6/x64/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
 /opt/hostedtoolcache/go/1.19.6/x64/src/sync/once.go:65
go.etcd.io/bbolt.(*DB).loadFreelist(0xc000123440?)
 /home/runner/go/pkg/mod/go.etcd.io/[email protected]/db.go:408 +0x47
go.etcd.io/bbolt.Open({0xc0002fd260, 0x1a}, 0x0?, 0xc000378c20)
 /home/runner/go/pkg/mod/go.etcd.io/[email protected]/db.go:290 +0x40c

ahrtr avatar Mar 31 '23 00:03 ahrtr

Or execute commands below if you can't provide the db file,

$ ./bbolt check <db-file>

$ ./bbolt pages <db-file>

$ ./bbolt page <db-file> 0

$ ./bbolt page <db-file> 1

ahrtr avatar Mar 31 '23 00:03 ahrtr

@gandarez could try passing PreLoadFreelist: false but it is always loaded in RW mode

Note that bbolt always loads the freelist in write mode, no matter what value is set for PreLoadFreelist.

EDIT:

Can this restriction be removed?

NO, we can't. Freelist management is the most crucial part of bbolt, and it's always needed in write mode, and definitely always necessary to load freelist in write mode.

ahrtr avatar Mar 31 '23 00:03 ahrtr

Sorry, I didn't tell the whole thing. What I meant was, if the user switches NoFreelistSync from false to true, db.Open() still loads the freelist.

I'm proposing changing: https://github.com/etcd-io/bbolt/blob/3e560dbae20dcb078d50f928ef7d17f1a56a4413/db.go#L253-L255

to

	if db.PreLoadFreelist && !db.NoFreeListSync {
		db.loadFreelist()
	}

cenkalti avatar Mar 31 '23 14:03 cenkalti

It isn't correct. db.NoFreeListSync == false only means not syncing freelist in this transaction; in other words, it doesn't mean not loading freelist. We still need to load freelist, even there is no synced freelist in previous transaction (bbolt will scan the whole db to reconstructure the freelist in this case).

ahrtr avatar Mar 31 '23 23:03 ahrtr

Sorry for my misunderstanding. Currently, there is no way to skip loading freelist from the disk if meta page points to an existing freelist. Is that correct?

cenkalti avatar Apr 01 '23 00:04 cenkalti

Currently, there is no way to skip loading freelist from the disk if meta page points to an existing freelist. Is that correct?

Correct. bbolt will always read from disk (either from synced freelist or scan the whole db to restructure the freelist) to get the freelist in write mode.

The most important thing for now is to reproduce the issue ourselves. It would be great if @gandarez can provide some clues.

ahrtr avatar Apr 01 '23 00:04 ahrtr

I can't promise anything as I said it runs in our user's machines, but I'll try to get a copy of it.

gandarez avatar Apr 01 '23 00:04 gandarez

With NoFreeListSync: false, freelist is saved to a page and referenced from the meta page. With NoFreeListSync: true, freelist is not saved to the file and a special marker is put into the meta page.

Current freelist loading logic does not take NoFreeListSync option into account. https://github.com/etcd-io/bbolt/blob/e6563eef17d87c7e96e96fbb2b78be3e93d67ff1/db.go#L371-L383

By setting it to true, the user of the library accepts that the freelist will not be saved to disk and accepts the latency for scanning whole db.

The loading behavior currently depends only on the existence of freelist on the db file.

I have a proposal for adding NoFreeListSync into the decision:

 if !db.hasSyncedFreelist() || db.NoFreeListSync { 
 	// Reconstruct free list by scanning the DB. 
 	db.freelist.readIDs(db.freepages()) 
 } else { 
 	// Read free list from freelist page. 
 	db.freelist.read(db.page(db.meta().Freelist())) 
 } 

This may help to open the database by changing an option if the corruption is just in the freelist.

cenkalti avatar Apr 01 '23 00:04 cenkalti

@gandarez is there any update on this? thx

ahrtr avatar Apr 20 '23 12:04 ahrtr

I haven't heard anything from nobody, is this issue still on track?

gandarez avatar Aug 01 '23 23:08 gandarez

I haven't heard anything from nobody, is this issue still on track?

Based on all the info we have so far, most likely the db file is somehow corrupted. The suggestion I can think of for now is to regularly backup the db file [your application is a standalone client]. For distributed systems, single points of failure are usually tolerated.

It would be great if you can provide the db file next time when you run into similar issue, so that I can double check. I can also try to fix the corrupted db file using the surgery commands.

BTW, how many times have you run into such corruption issue in your application?

ahrtr avatar Aug 02 '23 09:08 ahrtr

Running as a standalone application it's hard to say how many users were affected but it seems only one is still running with this issue. I tried to contact but didn't get any reply from them.

gandarez avatar Aug 02 '23 12:08 gandarez