bbolt icon indicating copy to clipboard operation
bbolt copied to clipboard

after restart, bbolt db failed to get all reachable pages

Open timannguyen opened this issue 1 year ago • 22 comments

After restarting the process running with open telemetry file extension, we get

panic: freepages: failed to get all reachable pages (page 138577: multiple references (stack: [110806 65199 138577]))

goroutine 98 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
	go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 69

I'll see if i can get the db pulled from the running machine. Sorry for no db blob.

is there any means of recovery from this issue ?

timannguyen avatar Jun 26 '24 16:06 timannguyen

After restarting the process

Did the machine/VM restarted?

Do you have a detailed steps to reproduce this issue by any chance?

I'll see if i can get the db pulled from the running machine

Yes, it would be helpful if you could provide the db file.

is there any means of recovery from this issue ?

We have surgery commands, but again we need to the db file firstly.

ahrtr avatar Jun 26 '24 17:06 ahrtr

This seems like also a potential issue of the freelist management, e.g a page was allocated twice, accordingly it's referenced by multiple places.

ahrtr avatar Jun 27 '24 13:06 ahrtr

I could not get an approval to provide the db file. sorry for the trouble.

timannguyen avatar Jun 27 '24 20:06 timannguyen

Hi @timshannon just want to confirm that the issue show up after the host or vm where the process run has been restarted? Thanks

fuweid avatar Jul 02 '24 09:07 fuweid

@fuweid I assume you meant to call out @timannguyen not @timshannon right?

timshannon avatar Jul 02 '24 16:07 timshannon

@fuweid I assume you meant to call out @timannguyen not @timshannon right?

Right, sorry for the confusion.

ahrtr avatar Jul 02 '24 17:07 ahrtr

sorry @timshannon for the confusion. I should ping @timannguyen

fuweid avatar Jul 02 '24 22:07 fuweid

@fuweid sorry for the delay, the process was restarted and the issue occurs.

timannguyen avatar Jul 08 '24 14:07 timannguyen

@fuweid sorry for the delay, the process was restarted and the issue occurs.

Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?".

Also please also provide the following info,

  • GOOS and version;
  • GOARCH;
  • filesystem;

I could not get an approval to provide the db file. sorry for the trouble.

Could you dump the db with all the value redacted?

./bbolt page  --all --format-value=redacted path-2-db

ahrtr avatar Jul 08 '24 14:07 ahrtr

@fuweid sorry for the delay, the process was restarted and the issue occurs.

Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?".

Also please also provide the following info,

  • GOOS and version;
  • GOARCH;
  • filesystem;

I could not get an approval to provide the db file. sorry for the trouble.

Could you dump the db with all the value redacted?

./bbolt page  --all --format-value=redacted path-2-db

@ahrtr I met the same issue. Here is my db dump: db.txt

lnnt avatar Aug 28 '24 12:08 lnnt

@fuweid sorry for the delay, the process was restarted and the issue occurs.

Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?". Also please also provide the following info,

  • GOOS and version;
  • GOARCH;
  • filesystem;

I could not get an approval to provide the db file. sorry for the trouble.

Could you dump the db with all the value redacted?

./bbolt page  --all --format-value=redacted path-2-db

@ahrtr I met the same issue. Here is my db dump: db.txt

My machine met unexpected power down, then bbolt db say:

panic: freepages: failed to get all reachable pages (page 4271: multiple references (stack: [1079 3745 4271]))
goroutine 160 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
         go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 159
         go.etcd.io/[email protected]/db.go:1200 +0x1e5

Here is my db file: db.zip

lnnt avatar Aug 29 '24 03:08 lnnt

Here is my db file: db.zip

Thanks. I will take a look later.

ahrtr avatar Aug 29 '24 06:08 ahrtr

My machine met unexpected power down, then bbolt db say:

panic: freepages: failed to get all reachable pages (page 4271: multiple references (stack: [1079 3745 4271]))
goroutine 160 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
         go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 159
         go.etcd.io/[email protected]/db.go:1200 +0x1e5

Here is my db file: db.zip

The last transaction before the unexpected poweroff somehow corrupted the db. One possible reason is some page (i.e. 1079) wasn't successfully synced to disk even the fdatasync systemcall returned successfully.

The workaround to fix the corrupted db is to rollback the last transaction,

$ ./bbolt surgery revert-meta-page ./db.db --output ./new.db
The meta page is reverted.

$ ./bbolt check ./new.db 
OK

Reference

F5WPP69Q7H:bbolt wachao$ pwd
/Users/wachao/go/src/github.com/ahrtr/bbolt
F5WPP69Q7H:bbolt wachao$ ./bbolt page db.db 0
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=5150>
Freelist:   <pgid=18446744073709551615>
HWM:        <pgid=11865>
Txn ID:     10789638
Checksum:   4e1e8a40b76f0de1

F5WPP69Q7H:bbolt wachao$ ./bbolt page db.db 1
Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=11036>
Freelist:   <pgid=18446744073709551615>
HWM:        <pgid=11865>
Txn ID:     10789637
Checksum:   bb9a5cce3b6710a5

F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 5150
Page ID:    5150
Page Type:  leaf
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 10

"alarm": <pgid=0,seq=0>
"auth": <pgid=0,seq=0>
"authRoles": <pgid=0,seq=0>
"authUsers": <pgid=0,seq=0>
"cluster": <pgid=0,seq=0>
"key": <pgid=1079,seq=0>
"lease": <pgid=6891,seq=0>
"members": <pgid=0,seq=0>
"members_removed": <pgid=0,seq=0>
"meta": <pgid=0,seq=0>

F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 1079
Page ID:    1079
Page Type:  branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 52

0000000000fd22985f0000000000000000: <pgid=4271>
0000000000fd229d5f0000000000000000: <pgid=3874>
0000000000fd22a15f0000000000000000: <pgid=3875>
0000000000fd22a55f0000000000000000: <pgid=4288>
0000000000fd22aa5f0000000000000000: <pgid=7785>
0000000000fd22ae5f0000000000000000: <pgid=1132>
0000000000fd22b35f0000000000000000: <pgid=8839>
0000000000fd22b75f0000000000000000: <pgid=10820>
0000000000fd22bc5f0000000000000000: <pgid=11084>
0000000000fd22c15f0000000000000000: <pgid=6097>
0000000000fd22c65f0000000000000000: <pgid=4104>
0000000000fd22cb5f0000000000000000: <pgid=7786>
0000000000fd22d15f0000000000000000: <pgid=10821>
0000000000fd22d55f0000000000000000: <pgid=9570>
0000000000fd22d95f0000000000000000: <pgid=2861>
0000000000fd22de5f0000000000000000: <pgid=5190>
0000000000fd22e35f0000000000000000: <pgid=9571>
0000000000fd22e75f0000000000000000: <pgid=7264>
0000000000fd22eb5f0000000000000000: <pgid=2860>
0000000000fd22f15f0000000000000000: <pgid=7787>
0000000000fd22f55f0000000000000000: <pgid=11035>
0000000000fd22fb5f0000000000000000: <pgid=7084>
0000000000fd22ff5f0000000000000000: <pgid=7265>
0000000000fd23055f0000000000000000: <pgid=10822>
0000000000fd230a5f0000000000000000: <pgid=2907>
0000000000fd230e5f0000000000000000: <pgid=2906>
0000000000fd23135f0000000000000000: <pgid=2151>
0000000000fd23175f0000000000000000: <pgid=9010>
0000000000fd231c5f0000000000000000: <pgid=2150>
0000000000fd23205f0000000000000000: <pgid=7788>
0000000000fd23255f0000000000000000: <pgid=11185>
0000000000fd232a5f0000000000000000: <pgid=2213>
0000000000fd232e5f0000000000000000: <pgid=2212>
0000000000fd23345f0000000000000000: <pgid=2629>
0000000000fd23385f0000000000000000: <pgid=5821>
0000000000fd233d5f0000000000000000: <pgid=2628>
0000000000fd23425f0000000000000000: <pgid=7639>
0000000000fd23465f0000000000000000: <pgid=5820>
0000000000fd234c5f0000000000000000: <pgid=419>
0000000000fd23505f0000000000000000: <pgid=11187>
0000000000fd23565f0000000000000000: <pgid=11186>
0000000000fd235a5f0000000000000000: <pgid=8300>
0000000000fd235f5f0000000000000000: <pgid=7630>
0000000000fd23635f0000000000000000: <pgid=10824>
0000000000fd23675f0000000000000000: <pgid=10823>
0000000000fd236b5f0000000000000000: <pgid=626>
0000000000fd23705f0000000000000000: <pgid=9930>
0000000000fd23755f0000000000000000: <pgid=7093>
0000000000fd237b5f0000000000000000: <pgid=7789>
0000000000fd237f5f0000000000000000: <pgid=4512>
0000000000fd23855f0000000000000000: <pgid=9931>
0000000000fd238a5f0000000000000000: <pgid=3745>

F5WPP69Q7H:bbolt wachao$ 
F5WPP69Q7H:bbolt wachao$ 
F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 3745
Page ID:    3745
Page Type:  branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 68

0000000000fd22985f0000000000000000: <pgid=4271>
0000000000fd229d5f0000000000000000: <pgid=3874>
0000000000fd22a15f0000000000000000: <pgid=3875>
0000000000fd22a55f0000000000000000: <pgid=4288>
0000000000fd22aa5f0000000000000000: <pgid=7785>
0000000000fd22ae5f0000000000000000: <pgid=1132>
0000000000fd22b35f0000000000000000: <pgid=8839>
0000000000fd22b75f0000000000000000: <pgid=10820>
0000000000fd22bc5f0000000000000000: <pgid=11084>
0000000000fd22c15f0000000000000000: <pgid=6097>
0000000000fd22c65f0000000000000000: <pgid=4104>
0000000000fd22cb5f0000000000000000: <pgid=7786>
0000000000fd22d15f0000000000000000: <pgid=10821>
0000000000fd22d55f0000000000000000: <pgid=9570>
0000000000fd22d95f0000000000000000: <pgid=2861>
0000000000fd22de5f0000000000000000: <pgid=5190>
0000000000fd22e35f0000000000000000: <pgid=9571>
0000000000fd22e75f0000000000000000: <pgid=7264>
0000000000fd22eb5f0000000000000000: <pgid=2860>
0000000000fd22f15f0000000000000000: <pgid=7787>
0000000000fd22f55f0000000000000000: <pgid=11035>
0000000000fd22fb5f0000000000000000: <pgid=7084>
0000000000fd22ff5f0000000000000000: <pgid=7265>
0000000000fd23055f0000000000000000: <pgid=10822>
0000000000fd230a5f0000000000000000: <pgid=2907>
0000000000fd230e5f0000000000000000: <pgid=2906>
0000000000fd23135f0000000000000000: <pgid=2151>
0000000000fd23175f0000000000000000: <pgid=9010>
0000000000fd231c5f0000000000000000: <pgid=2150>
0000000000fd23205f0000000000000000: <pgid=7788>
0000000000fd23255f0000000000000000: <pgid=11185>
0000000000fd232a5f0000000000000000: <pgid=2213>
0000000000fd232e5f0000000000000000: <pgid=2212>
0000000000fd23345f0000000000000000: <pgid=2629>
0000000000fd23385f0000000000000000: <pgid=5821>
0000000000fd233d5f0000000000000000: <pgid=2628>
0000000000fd23425f0000000000000000: <pgid=7639>
0000000000fd23465f0000000000000000: <pgid=5820>
0000000000fd234c5f0000000000000000: <pgid=419>
0000000000fd23505f0000000000000000: <pgid=11187>
0000000000fd23565f0000000000000000: <pgid=11186>
0000000000fd235a5f0000000000000000: <pgid=8300>
0000000000fd235f5f0000000000000000: <pgid=7630>
0000000000fd23635f0000000000000000: <pgid=10824>
0000000000fd23675f0000000000000000: <pgid=10823>
0000000000fd236b5f0000000000000000: <pgid=626>
0000000000fd23705f0000000000000000: <pgid=9930>
0000000000fd23755f0000000000000000: <pgid=7093>
0000000000fd237b5f0000000000000000: <pgid=7789>
0000000000fd237f5f0000000000000000: <pgid=4512>
0000000000fd23855f0000000000000000: <pgid=9931>
0000000000fd238a5f0000000000000000: <pgid=2496>
0000000000fd238f5f0000000000000000: <pgid=4526>
0000000000fd23935f0000000000000000: <pgid=10422>
0000000000fd23985f0000000000000000: <pgid=6237>
0000000000fd239c5f0000000000000000: <pgid=11188>
0000000000fd23a05f0000000000000000: <pgid=10423>
0000000000fd23a65f0000000000000000: <pgid=1306>
0000000000fd23ac5f0000000000000000: <pgid=8301>
0000000000fd23b15f0000000000000000: <pgid=1305>
0000000000fd23b55f0000000000000000: <pgid=6890>
0000000000fd23b95f0000000000000000: <pgid=9932>
0000000000fd23be5f0000000000000000: <pgid=2475>
0000000000fd23c25f0000000000000000: <pgid=8302>
0000000000fd23c85f0000000000000000: <pgid=781>
0000000000fd23cc5f0000000000000000: <pgid=456>
0000000000fd23d05f0000000000000000: <pgid=9933>
0000000000fd23d65f0000000000000000: <pgid=2252>

ahrtr avatar Aug 30 '24 11:08 ahrtr

@timshannon @lnnt Could you also share your systeminfo using https://github.com/ahrtr/systeminfo?

ahrtr avatar Aug 30 '24 15:08 ahrtr

@timannguyen :arrow_double_up:

timshannon avatar Aug 30 '24 15:08 timshannon

@timshannon @lnnt Could you also share your systeminfo using https://github.com/ahrtr/systeminfo?

cpu.IsBigEndian: false os.Getpagesize(): 4096 runtime.GOOS: linux runtime.GOARCH: amd64

lnnt avatar Aug 30 '24 15:08 lnnt

As mentioned in previous comment, the last transaction (10789638) before the unexpected poweroff somehow corrupted the db.

After rolling back the last transaction, the db is back to normal again. The max revision under the last good state is 16589739

old (good state before TXN 10789638):
Root: 11036 --> 6750 (leaf: key, index: 22/22) --> 3745 (index: 57/67) --> 1306 (index: 5/5, rev: 16589739)

The etcd-dump-db scan-keys command was added in https://github.com/etcd-io/etcd/pull/18539

$ ./etcd-dump-db scan-keys ~/tmp/etcd/778/db.db 16589739 2>/dev/null | grep "/registry/configmaps/istio-system/istio-namespace-controller-election"
pageID=1306, index=5, rev={Revision:{Main:16589739 Sub:0} tombstone:false}, value=[key "/registry/configmaps/istio-system/istio-namespace-controller-election" | val "k8s\x00\n\x0f\n\x02v1\x12\tConfigMap\x12\xeb\x03\n\xe8\x03\n#istio-namespace-controller-election\x12\x00\x1a\fistio-system\"\x00*$bb696087-260d-4167-bf06-17d3361f9b5f2\x008\x00B\b\b\x9e\xbe\xed\xb5\x06\x10\x00b\xe6\x01\n(control-plane.alpha.kubernetes.io/leader\x12\xb9\x01{\"holderIdentity\":\"istiod-d56968787-txq2d\",\"holderKey\":\"default\",\"leaseDurationSeconds\":30,\"acquireTime\":\"2024-08-13T13:26:54Z\",\"renewTime\":\"2024-08-27T06:16:13Z\",\"leaderTransitions\":0}\x8a\x01\x90\x01\n\x0fpilot-discovery\x12\x06Update\x1a\x02v1\"\b\b\xad\u07b5\xb6\x06\x10\x002\bFieldsV1:[\nY{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:control-plane.alpha.kubernetes.io/leader\":{}}}}B\x00\x1a\x00\"\x00" | created 9612546 | mod 16589739 | ver 157604]
pageID=4737, index=4, rev={Revision:{Main:16589786 Sub:0} tombstone:false}, value=[key "/registry/configmaps/istio-system/istio-namespace-controller-election" | val "k8s\x00\n\x0f\n\x02v1\x12\tConfigMap\x12\xeb\x03\n\xe8\x03\n#istio-namespace-controller-election\x12\x00\x1a\fistio-system\"\x00*$bb696087-260d-4167-bf06-17d3361f9b5f2\x008\x00B\b\b\x9e\xbe\xed\xb5\x06\x10\x00b\xe6\x01\n(control-plane.alpha.kubernetes.io/leader\x12\xb9\x01{\"holderIdentity\":\"istiod-d56968787-txq2d\",\"holderKey\":\"default\",\"leaseDurationSeconds\":30,\"acquireTime\":\"2024-08-13T13:26:54Z\",\"renewTime\":\"2024-08-27T06:16:21Z\",\"leaderTransitions\":0}\x8a\x01\x90\x01\n\x0fpilot-discovery\x12\x06Update\x1a\x02v1\"\b\b\xb5\u07b5\xb6\x06\x10\x002\bFieldsV1:[\nY{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:control-plane.alpha.kubernetes.io/leader\":{}}}}B\x00\x1a\x00\"\x00" | created 9612546 | mod 16589786 | ver 157605]
new (bad state after TXN 10789638):
Root: 5150  --> 1079 (leaf: key, index: ?) --> ??? --> 4737 (index: 4/4, rev: 16589786)

The problem is the leaf page 4737 and root page 5150 were persisted successfully, but some branch pages were not. As far as I can tell, most likely it's the file system's issue.

  • Note that even after TXN 10789638, the page 11036, 6750, 3745 and 1306 all kept unchanged. It's expected behavior.
  • After rolling back TXN 10789638, the db is back to normal. It means all the existing pages (including the ones mentioned above) kept unchanged. It's expected.
  • The page 1079 is obviously in a bad state. The problem is that its parent (5150) was successfully persisted to disk, but It wasn't. So its parent points to a bad page, which is exactly the reason for the corruption.
  • Note that all 5150, 1079, and 4737 were free pages before executing TXN 10789638.

As a followup, it's helpful if we can figure out the missing intermediate branch page between 1079 and 4737. We need to add a command something like "find-parent", e.g. find the parent of 4737 in this example.

ahrtr avatar Sep 04 '24 13:09 ahrtr

@ahrtr 2 questions:

  • Would it be possible to reproduce this case using LazyFS to have a solid understanding of the issue and maybe consider making bbolt resiliant to it?
  • Could we consider bbolt automatically reverting to a previous TXN if data on the tree on current TXN is broken?

serathius avatar Oct 04 '24 08:10 serathius

  • Would it be possible to reproduce this case using LazyFS to have a solid understanding of the issue and maybe consider making bbolt resiliant to it?

We already introduced dmflakey, which is similar to LazyFS, and without the possible downside of LazyFS, which (LazyFS) may not respect the fdatasync semantics. Please refer to https://github.com/fuweid/go-dmflakey/pull/1

Could we consider bbolt automatically reverting to a previous TXN if data on the tree on current TXN is broken?

Yes, It's kind of good point. I already thought about it, but not automatically done by bbolt. Based on the data corruption cases in the past around 3 years, IIRC, only minor parts could be fixed by bbolt surgery revert-meta-page. Also we are trying to simplify the code base instead of adding more logic into it, especially for such upper layer or users' logic. Instead, I would suggest to provide a guide for users to manually try it for now. We will revisit this after we have more confidence on the data corruption issue.

It also has downside... there may be data loss after reverting to previous TXN.

ahrtr avatar Oct 04 '24 10:10 ahrtr

We already introduced dmflakey, which is similar to LazyFS

Looks like issue is closed. Do you mean https://github.com/etcd-io/bbolt/issues/568? Repeating the ask, could we cover this exact scenario using dmflakey?

It also has downside... there may be data loss after reverting to previous TXN.

Just asking for this to be considered, because I would want to better understand the risk of revert. I think high level question is, is there a chance of previous transaction to be silently broken if the latest transaction was corrupted?

Don't think this needs to be a default option, might just be a feature in etcd as I think it would benefit it's ability to recover. For systems like etcd, bbolt is not the source of truth, just the kv database. We can still recover the last transaction from WAL. For cases where only the latest transaction was corrupted, instead of etcd giving up the bootstrap, we would just revert it and replay the transaction again from the WAL.

If the last transaction can be trusted I would propose that etcd should always try to revert meta page if the state of btree on the latest meta page is broken. But before implementing this we should verify and test health of previous transaction under different corruption scenarios.

serathius avatar Oct 04 '24 12:10 serathius

Hi @ahrtr I met the same issue. But my machine has not been forced to shut down, it has only experienced hot migration, and the behavior during that time is similar to freezing.

By looking through the previous issues, I found that there are two strange pages: the first 112 keys of 504 and 563 are exactly the same, so I suspect that it may be a failure encountered when writing after rebalance, but I can't further locate the specific code location. Can you help me find the possible code area?

bbolt page --all --format-value=redacteddb.log

tboevil avatar Mar 13 '25 10:03 tboevil

Similar issue: on a 3-node Proxmox cluster, Docker LXC containers only work on one node. When moved to the other two nodes, they throw errors, but moving them back to the original node works fine.

node2 no erros

node1

panic: freepages: failed to get all reachable pages (page 12: multiple references (stack: [12]))

goroutine 50 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
	go.etcd.io/[email protected]/db.go:1248 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 23

node3

time="2025-08-30T08:17:48.975519470+08:00" level=info msg="metadata content store policy set" policy=shared
panic: freepages: failed to get all reachable pages (page 118: multiple references (stack: [118]))

goroutine 23 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
	go.etcd.io/[email protected]/db.go:1248 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 12

sasha-id avatar Aug 30 '25 00:08 sasha-id