bbolt
bbolt copied to clipboard
after restart, bbolt db failed to get all reachable pages
After restarting the process running with open telemetry file extension, we get
panic: freepages: failed to get all reachable pages (page 138577: multiple references (stack: [110806 65199 138577]))
goroutine 98 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 69
I'll see if i can get the db pulled from the running machine. Sorry for no db blob.
is there any means of recovery from this issue ?
After restarting the process
Did the machine/VM restarted?
Do you have a detailed steps to reproduce this issue by any chance?
I'll see if i can get the db pulled from the running machine
Yes, it would be helpful if you could provide the db file.
is there any means of recovery from this issue ?
We have surgery commands, but again we need to the db file firstly.
This seems like also a potential issue of the freelist management, e.g a page was allocated twice, accordingly it's referenced by multiple places.
I could not get an approval to provide the db file. sorry for the trouble.
Hi @timshannon just want to confirm that the issue show up after the host or vm where the process run has been restarted? Thanks
@fuweid I assume you meant to call out @timannguyen not @timshannon right?
@fuweid I assume you meant to call out @timannguyen not @timshannon right?
Right, sorry for the confusion.
sorry @timshannon for the confusion. I should ping @timannguyen
@fuweid sorry for the delay, the process was restarted and the issue occurs.
@fuweid sorry for the delay, the process was restarted and the issue occurs.
Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?".
Also please also provide the following info,
- GOOS and version;
- GOARCH;
- filesystem;
I could not get an approval to provide the db file. sorry for the trouble.
Could you dump the db with all the value redacted?
./bbolt page --all --format-value=redacted path-2-db
@fuweid sorry for the delay, the process was restarted and the issue occurs.
Thanks for the feedback but the questions isn't answered. The question is "
Did the machine/VM restarted right before the issue being occurred?".Also please also provide the following info,
- GOOS and version;
- GOARCH;
- filesystem;
I could not get an approval to provide the db file. sorry for the trouble.
Could you dump the db with all the value redacted?
./bbolt page --all --format-value=redacted path-2-db
@ahrtr I met the same issue. Here is my db dump: db.txt
@fuweid sorry for the delay, the process was restarted and the issue occurs.
Thanks for the feedback but the questions isn't answered. The question is "
Did the machine/VM restarted right before the issue being occurred?". Also please also provide the following info,
- GOOS and version;
- GOARCH;
- filesystem;
I could not get an approval to provide the db file. sorry for the trouble.
Could you dump the db with all the value redacted?
./bbolt page --all --format-value=redacted path-2-db@ahrtr I met the same issue. Here is my db dump: db.txt
My machine met unexpected power down, then bbolt db say:
panic: freepages: failed to get all reachable pages (page 4271: multiple references (stack: [1079 3745 4271]))
goroutine 160 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 159
go.etcd.io/[email protected]/db.go:1200 +0x1e5
Here is my db file: db.zip
My machine met unexpected power down, then bbolt db say:
panic: freepages: failed to get all reachable pages (page 4271: multiple references (stack: [1079 3745 4271])) goroutine 160 [running]: go.etcd.io/bbolt.(*DB).freepages.func2() go.etcd.io/[email protected]/db.go:1202 +0x8d created by go.etcd.io/bbolt.(*DB).freepages in goroutine 159 go.etcd.io/[email protected]/db.go:1200 +0x1e5Here is my db file: db.zip
The last transaction before the unexpected poweroff somehow corrupted the db. One possible reason is some page (i.e. 1079) wasn't successfully synced to disk even the fdatasync systemcall returned successfully.
The workaround to fix the corrupted db is to rollback the last transaction,
$ ./bbolt surgery revert-meta-page ./db.db --output ./new.db
The meta page is reverted.
$ ./bbolt check ./new.db
OK
Reference
F5WPP69Q7H:bbolt wachao$ pwd
/Users/wachao/go/src/github.com/ahrtr/bbolt
F5WPP69Q7H:bbolt wachao$ ./bbolt page db.db 0
Page ID: 0
Page Type: meta
Total Size: 4096 bytes
Overflow pages: 0
Version: 2
Page Size: 4096 bytes
Flags: 00000000
Root: <pgid=5150>
Freelist: <pgid=18446744073709551615>
HWM: <pgid=11865>
Txn ID: 10789638
Checksum: 4e1e8a40b76f0de1
F5WPP69Q7H:bbolt wachao$ ./bbolt page db.db 1
Page ID: 1
Page Type: meta
Total Size: 4096 bytes
Overflow pages: 0
Version: 2
Page Size: 4096 bytes
Flags: 00000000
Root: <pgid=11036>
Freelist: <pgid=18446744073709551615>
HWM: <pgid=11865>
Txn ID: 10789637
Checksum: bb9a5cce3b6710a5
F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 5150
Page ID: 5150
Page Type: leaf
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 10
"alarm": <pgid=0,seq=0>
"auth": <pgid=0,seq=0>
"authRoles": <pgid=0,seq=0>
"authUsers": <pgid=0,seq=0>
"cluster": <pgid=0,seq=0>
"key": <pgid=1079,seq=0>
"lease": <pgid=6891,seq=0>
"members": <pgid=0,seq=0>
"members_removed": <pgid=0,seq=0>
"meta": <pgid=0,seq=0>
F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 1079
Page ID: 1079
Page Type: branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 52
0000000000fd22985f0000000000000000: <pgid=4271>
0000000000fd229d5f0000000000000000: <pgid=3874>
0000000000fd22a15f0000000000000000: <pgid=3875>
0000000000fd22a55f0000000000000000: <pgid=4288>
0000000000fd22aa5f0000000000000000: <pgid=7785>
0000000000fd22ae5f0000000000000000: <pgid=1132>
0000000000fd22b35f0000000000000000: <pgid=8839>
0000000000fd22b75f0000000000000000: <pgid=10820>
0000000000fd22bc5f0000000000000000: <pgid=11084>
0000000000fd22c15f0000000000000000: <pgid=6097>
0000000000fd22c65f0000000000000000: <pgid=4104>
0000000000fd22cb5f0000000000000000: <pgid=7786>
0000000000fd22d15f0000000000000000: <pgid=10821>
0000000000fd22d55f0000000000000000: <pgid=9570>
0000000000fd22d95f0000000000000000: <pgid=2861>
0000000000fd22de5f0000000000000000: <pgid=5190>
0000000000fd22e35f0000000000000000: <pgid=9571>
0000000000fd22e75f0000000000000000: <pgid=7264>
0000000000fd22eb5f0000000000000000: <pgid=2860>
0000000000fd22f15f0000000000000000: <pgid=7787>
0000000000fd22f55f0000000000000000: <pgid=11035>
0000000000fd22fb5f0000000000000000: <pgid=7084>
0000000000fd22ff5f0000000000000000: <pgid=7265>
0000000000fd23055f0000000000000000: <pgid=10822>
0000000000fd230a5f0000000000000000: <pgid=2907>
0000000000fd230e5f0000000000000000: <pgid=2906>
0000000000fd23135f0000000000000000: <pgid=2151>
0000000000fd23175f0000000000000000: <pgid=9010>
0000000000fd231c5f0000000000000000: <pgid=2150>
0000000000fd23205f0000000000000000: <pgid=7788>
0000000000fd23255f0000000000000000: <pgid=11185>
0000000000fd232a5f0000000000000000: <pgid=2213>
0000000000fd232e5f0000000000000000: <pgid=2212>
0000000000fd23345f0000000000000000: <pgid=2629>
0000000000fd23385f0000000000000000: <pgid=5821>
0000000000fd233d5f0000000000000000: <pgid=2628>
0000000000fd23425f0000000000000000: <pgid=7639>
0000000000fd23465f0000000000000000: <pgid=5820>
0000000000fd234c5f0000000000000000: <pgid=419>
0000000000fd23505f0000000000000000: <pgid=11187>
0000000000fd23565f0000000000000000: <pgid=11186>
0000000000fd235a5f0000000000000000: <pgid=8300>
0000000000fd235f5f0000000000000000: <pgid=7630>
0000000000fd23635f0000000000000000: <pgid=10824>
0000000000fd23675f0000000000000000: <pgid=10823>
0000000000fd236b5f0000000000000000: <pgid=626>
0000000000fd23705f0000000000000000: <pgid=9930>
0000000000fd23755f0000000000000000: <pgid=7093>
0000000000fd237b5f0000000000000000: <pgid=7789>
0000000000fd237f5f0000000000000000: <pgid=4512>
0000000000fd23855f0000000000000000: <pgid=9931>
0000000000fd238a5f0000000000000000: <pgid=3745>
F5WPP69Q7H:bbolt wachao$
F5WPP69Q7H:bbolt wachao$
F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 3745
Page ID: 3745
Page Type: branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 68
0000000000fd22985f0000000000000000: <pgid=4271>
0000000000fd229d5f0000000000000000: <pgid=3874>
0000000000fd22a15f0000000000000000: <pgid=3875>
0000000000fd22a55f0000000000000000: <pgid=4288>
0000000000fd22aa5f0000000000000000: <pgid=7785>
0000000000fd22ae5f0000000000000000: <pgid=1132>
0000000000fd22b35f0000000000000000: <pgid=8839>
0000000000fd22b75f0000000000000000: <pgid=10820>
0000000000fd22bc5f0000000000000000: <pgid=11084>
0000000000fd22c15f0000000000000000: <pgid=6097>
0000000000fd22c65f0000000000000000: <pgid=4104>
0000000000fd22cb5f0000000000000000: <pgid=7786>
0000000000fd22d15f0000000000000000: <pgid=10821>
0000000000fd22d55f0000000000000000: <pgid=9570>
0000000000fd22d95f0000000000000000: <pgid=2861>
0000000000fd22de5f0000000000000000: <pgid=5190>
0000000000fd22e35f0000000000000000: <pgid=9571>
0000000000fd22e75f0000000000000000: <pgid=7264>
0000000000fd22eb5f0000000000000000: <pgid=2860>
0000000000fd22f15f0000000000000000: <pgid=7787>
0000000000fd22f55f0000000000000000: <pgid=11035>
0000000000fd22fb5f0000000000000000: <pgid=7084>
0000000000fd22ff5f0000000000000000: <pgid=7265>
0000000000fd23055f0000000000000000: <pgid=10822>
0000000000fd230a5f0000000000000000: <pgid=2907>
0000000000fd230e5f0000000000000000: <pgid=2906>
0000000000fd23135f0000000000000000: <pgid=2151>
0000000000fd23175f0000000000000000: <pgid=9010>
0000000000fd231c5f0000000000000000: <pgid=2150>
0000000000fd23205f0000000000000000: <pgid=7788>
0000000000fd23255f0000000000000000: <pgid=11185>
0000000000fd232a5f0000000000000000: <pgid=2213>
0000000000fd232e5f0000000000000000: <pgid=2212>
0000000000fd23345f0000000000000000: <pgid=2629>
0000000000fd23385f0000000000000000: <pgid=5821>
0000000000fd233d5f0000000000000000: <pgid=2628>
0000000000fd23425f0000000000000000: <pgid=7639>
0000000000fd23465f0000000000000000: <pgid=5820>
0000000000fd234c5f0000000000000000: <pgid=419>
0000000000fd23505f0000000000000000: <pgid=11187>
0000000000fd23565f0000000000000000: <pgid=11186>
0000000000fd235a5f0000000000000000: <pgid=8300>
0000000000fd235f5f0000000000000000: <pgid=7630>
0000000000fd23635f0000000000000000: <pgid=10824>
0000000000fd23675f0000000000000000: <pgid=10823>
0000000000fd236b5f0000000000000000: <pgid=626>
0000000000fd23705f0000000000000000: <pgid=9930>
0000000000fd23755f0000000000000000: <pgid=7093>
0000000000fd237b5f0000000000000000: <pgid=7789>
0000000000fd237f5f0000000000000000: <pgid=4512>
0000000000fd23855f0000000000000000: <pgid=9931>
0000000000fd238a5f0000000000000000: <pgid=2496>
0000000000fd238f5f0000000000000000: <pgid=4526>
0000000000fd23935f0000000000000000: <pgid=10422>
0000000000fd23985f0000000000000000: <pgid=6237>
0000000000fd239c5f0000000000000000: <pgid=11188>
0000000000fd23a05f0000000000000000: <pgid=10423>
0000000000fd23a65f0000000000000000: <pgid=1306>
0000000000fd23ac5f0000000000000000: <pgid=8301>
0000000000fd23b15f0000000000000000: <pgid=1305>
0000000000fd23b55f0000000000000000: <pgid=6890>
0000000000fd23b95f0000000000000000: <pgid=9932>
0000000000fd23be5f0000000000000000: <pgid=2475>
0000000000fd23c25f0000000000000000: <pgid=8302>
0000000000fd23c85f0000000000000000: <pgid=781>
0000000000fd23cc5f0000000000000000: <pgid=456>
0000000000fd23d05f0000000000000000: <pgid=9933>
0000000000fd23d65f0000000000000000: <pgid=2252>
@timshannon @lnnt Could you also share your systeminfo using https://github.com/ahrtr/systeminfo?
@timannguyen :arrow_double_up:
@timshannon @lnnt Could you also share your systeminfo using https://github.com/ahrtr/systeminfo?
cpu.IsBigEndian: false os.Getpagesize(): 4096 runtime.GOOS: linux runtime.GOARCH: amd64
As mentioned in previous comment, the last transaction (10789638) before the unexpected poweroff somehow corrupted the db.
After rolling back the last transaction, the db is back to normal again. The max revision under the last good state is 16589739
old (good state before TXN 10789638):
Root: 11036 --> 6750 (leaf: key, index: 22/22) --> 3745 (index: 57/67) --> 1306 (index: 5/5, rev: 16589739)
The etcd-dump-db scan-keys command was added in https://github.com/etcd-io/etcd/pull/18539
$ ./etcd-dump-db scan-keys ~/tmp/etcd/778/db.db 16589739 2>/dev/null | grep "/registry/configmaps/istio-system/istio-namespace-controller-election"
pageID=1306, index=5, rev={Revision:{Main:16589739 Sub:0} tombstone:false}, value=[key "/registry/configmaps/istio-system/istio-namespace-controller-election" | val "k8s\x00\n\x0f\n\x02v1\x12\tConfigMap\x12\xeb\x03\n\xe8\x03\n#istio-namespace-controller-election\x12\x00\x1a\fistio-system\"\x00*$bb696087-260d-4167-bf06-17d3361f9b5f2\x008\x00B\b\b\x9e\xbe\xed\xb5\x06\x10\x00b\xe6\x01\n(control-plane.alpha.kubernetes.io/leader\x12\xb9\x01{\"holderIdentity\":\"istiod-d56968787-txq2d\",\"holderKey\":\"default\",\"leaseDurationSeconds\":30,\"acquireTime\":\"2024-08-13T13:26:54Z\",\"renewTime\":\"2024-08-27T06:16:13Z\",\"leaderTransitions\":0}\x8a\x01\x90\x01\n\x0fpilot-discovery\x12\x06Update\x1a\x02v1\"\b\b\xad\u07b5\xb6\x06\x10\x002\bFieldsV1:[\nY{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:control-plane.alpha.kubernetes.io/leader\":{}}}}B\x00\x1a\x00\"\x00" | created 9612546 | mod 16589739 | ver 157604]
pageID=4737, index=4, rev={Revision:{Main:16589786 Sub:0} tombstone:false}, value=[key "/registry/configmaps/istio-system/istio-namespace-controller-election" | val "k8s\x00\n\x0f\n\x02v1\x12\tConfigMap\x12\xeb\x03\n\xe8\x03\n#istio-namespace-controller-election\x12\x00\x1a\fistio-system\"\x00*$bb696087-260d-4167-bf06-17d3361f9b5f2\x008\x00B\b\b\x9e\xbe\xed\xb5\x06\x10\x00b\xe6\x01\n(control-plane.alpha.kubernetes.io/leader\x12\xb9\x01{\"holderIdentity\":\"istiod-d56968787-txq2d\",\"holderKey\":\"default\",\"leaseDurationSeconds\":30,\"acquireTime\":\"2024-08-13T13:26:54Z\",\"renewTime\":\"2024-08-27T06:16:21Z\",\"leaderTransitions\":0}\x8a\x01\x90\x01\n\x0fpilot-discovery\x12\x06Update\x1a\x02v1\"\b\b\xb5\u07b5\xb6\x06\x10\x002\bFieldsV1:[\nY{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:control-plane.alpha.kubernetes.io/leader\":{}}}}B\x00\x1a\x00\"\x00" | created 9612546 | mod 16589786 | ver 157605]
new (bad state after TXN 10789638):
Root: 5150 --> 1079 (leaf: key, index: ?) --> ??? --> 4737 (index: 4/4, rev: 16589786)
The problem is the leaf page 4737 and root page 5150 were persisted successfully, but some branch pages were not. As far as I can tell, most likely it's the file system's issue.
- Note that even after TXN 10789638, the page 11036, 6750, 3745 and 1306 all kept unchanged. It's expected behavior.
- After rolling back TXN 10789638, the db is back to normal. It means all the existing pages (including the ones mentioned above) kept unchanged. It's expected.
- The page 1079 is obviously in a bad state. The problem is that its parent (5150) was successfully persisted to disk, but It wasn't. So its parent points to a bad page, which is exactly the reason for the corruption.
- Note that all 5150, 1079, and 4737 were free pages before executing TXN 10789638.
As a followup, it's helpful if we can figure out the missing intermediate branch page between 1079 and 4737. We need to add a command something like "find-parent", e.g. find the parent of 4737 in this example.
@ahrtr 2 questions:
- Would it be possible to reproduce this case using LazyFS to have a solid understanding of the issue and maybe consider making bbolt resiliant to it?
- Could we consider bbolt automatically reverting to a previous TXN if data on the tree on current TXN is broken?
- Would it be possible to reproduce this case using LazyFS to have a solid understanding of the issue and maybe consider making bbolt resiliant to it?
We already introduced dmflakey, which is similar to LazyFS, and without the possible downside of LazyFS, which (LazyFS) may not respect the fdatasync semantics. Please refer to https://github.com/fuweid/go-dmflakey/pull/1
Could we consider bbolt automatically reverting to a previous TXN if data on the tree on current TXN is broken?
Yes, It's kind of good point. I already thought about it, but not automatically done by bbolt. Based on the data corruption cases in the past around 3 years, IIRC, only minor parts could be fixed by bbolt surgery revert-meta-page. Also we are trying to simplify the code base instead of adding more logic into it, especially for such upper layer or users' logic. Instead, I would suggest to provide a guide for users to manually try it for now. We will revisit this after we have more confidence on the data corruption issue.
It also has downside... there may be data loss after reverting to previous TXN.
We already introduced dmflakey, which is similar to LazyFS
Looks like issue is closed. Do you mean https://github.com/etcd-io/bbolt/issues/568? Repeating the ask, could we cover this exact scenario using dmflakey?
It also has downside... there may be data loss after reverting to previous TXN.
Just asking for this to be considered, because I would want to better understand the risk of revert. I think high level question is, is there a chance of previous transaction to be silently broken if the latest transaction was corrupted?
Don't think this needs to be a default option, might just be a feature in etcd as I think it would benefit it's ability to recover. For systems like etcd, bbolt is not the source of truth, just the kv database. We can still recover the last transaction from WAL. For cases where only the latest transaction was corrupted, instead of etcd giving up the bootstrap, we would just revert it and replay the transaction again from the WAL.
If the last transaction can be trusted I would propose that etcd should always try to revert meta page if the state of btree on the latest meta page is broken. But before implementing this we should verify and test health of previous transaction under different corruption scenarios.
Hi @ahrtr I met the same issue. But my machine has not been forced to shut down, it has only experienced hot migration, and the behavior during that time is similar to freezing.
By looking through the previous issues, I found that there are two strange pages: the first 112 keys of 504 and 563 are exactly the same, so I suspect that it may be a failure encountered when writing after rebalance, but I can't further locate the specific code location. Can you help me find the possible code area?
bbolt page --all --format-value=redacted:
db.log
Similar issue: on a 3-node Proxmox cluster, Docker LXC containers only work on one node. When moved to the other two nodes, they throw errors, but moving them back to the original node works fine.
node2 no erros
node1
panic: freepages: failed to get all reachable pages (page 12: multiple references (stack: [12]))
goroutine 50 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/[email protected]/db.go:1248 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 23
node3
time="2025-08-30T08:17:48.975519470+08:00" level=info msg="metadata content store policy set" policy=shared
panic: freepages: failed to get all reachable pages (page 118: multiple references (stack: [118]))
goroutine 23 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/[email protected]/db.go:1248 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 12