bbolt
bbolt copied to clipboard
Corrupted db file when the vm got turned off because of an overload
OS: MacOS with RHEL VM
The db file got corrupted when the MAC OS decided to restart by itself and my program was running in RHEL VM. Following is the check output.
$ bolt check tmp.db
page 0: multiple references
page 0: invalid type: unknown<00>
panic: invalid page type: 0: 0
goroutine 5 [running]:
panic(0x4e4120, 0xc420010610)
/usr/lib/golang/src/runtime/panic.go:500 +0x1a1
github.com/boltdb/bolt.(*Cursor).search(0xc42003eba8, 0x7f50350f20f0, 0xa, 0xa, 0x1bb69)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/cursor.go:256 +0x429
github.com/boltdb/bolt.(*Cursor).seek(0xc42003eba8, 0x7f50350f20f0, 0xa, 0xa, 0x0, 0x0, 0x4f77a0, 0xc42000a3f0, 0x2, 0x2, ...)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/cursor.go:159 +0xb1
github.com/boltdb/bolt.(*Bucket).Bucket(0xc420078018, 0x7f50350f20f0, 0xa, 0xa, 0x0)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/bucket.go:112 +0x108
github.com/boltdb/bolt.(*Tx).checkBucket.func2(0x7f50350f20f0, 0xa, 0xa, 0x7f50350f20fa, 0x66, 0x66, 0x66, 0x0)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:449 +0x70
github.com/boltdb/bolt.(*Bucket).ForEach(0xc420078018, 0xc42003ecc0, 0x0, 0xc42003ecf0)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/bucket.go:390 +0xff
github.com/boltdb/bolt.(*Tx).checkBucket(0xc420078000, 0xc420078018, 0xc42003eea0, 0xc42003eed0, 0xc4200540c0)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:453 +0x135
github.com/boltdb/bolt.(*Tx).check(0xc420078000, 0xc4200540c0)
/opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:404 +0x5f7
created by github.com/boltdb/bolt.(*Tx).Check
/opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:379 +0x67
Is there a way to fix the db file by any means? I check https://github.com/boltdb/bolt/issues/348 and my version (ee30b748bcfbd74ec1d8439ae8fd4f9123a5c94e
) is greater than that .
Note that it didn't happen again when i tried to reproduce again by powering off the virtual machine manually from MAC OS.
Had anyone else bumped into this issue?
Can any maintainer check this?
ping?
is this repo actively maintained?
I don't know, but it's still the main fork that I know of. The original fork was archived because it was considered to already be complete and they didn't want to weigh it down with extra features.
Virtual machines are tricky. You didn't say what you ran the VM in, but VirtualBox for example ignores flush requests by default, which Bolt (and every other database) depends on to ensure that writes occur in the correct order. That's not a problem if it's shut down normally, but a forced shutdown outside of the VM software's control can lead to partial, out-of-order writes which lead to corruption.
I have the same problem that it happened on Windows XP. I use the repo on release project and it happend yesterday. I didn't run it on the VM and didn't power off the system. I just used the put function to save some info and the bucket can be readed and cannot be writed.
@dtfinch It was a redhat OS in VM. In that case, how would the accidental power failure case be?
@liqingsanjin I just got corrupted for no reason?
I just used the put function to save some info and the bucket can be readed and cannot be writed.
Can you explain how it was done?
@bharathramh92 Sorry I don't know how it happened. I deploy my program on 600+ computers that operation system are windows 7 and windows XP. It's about a month since I deploy my program. It's no problem until yesterday. From log files of my program, I saw that when my program tried to write a bucket and then it panic an error which is same of yours, but the bucket can be read. I tried to restart the program and windows. It can't be write any more.
Following is my log out:
time="2018-08-17T22:04:16+08:00" level=error msg="invalid page type: 0: 0"
@liqingsanjin that is so strange. I never had that issue.
I have the same problem.
I saw a similar problem:
invalid page type: 0: 0
File "go.etcd.io/[email protected]/cursor.go", line 250, in go.etcd.io/bbolt.(*Cursor).search
File "go.etcd.io/[email protected]/cursor.go", line 159, in go.etcd.io/bbolt.(*Cursor).seek
File "go.etcd.io/[email protected]/bucket.go", line 105, in go.etcd.io/bbolt.(*Bucket).Bucket
File "go.etcd.io/[email protected]/tx.go", line 101, in go.etcd.io/bbolt.(*Tx).Bucket
This message comes from:
https://github.com/etcd-io/bbolt/blob/4b8b43e23cceca257d3d2958882dec02d9b16c69/cursor.go#L249-L250
So p.id == 0
and also p.flags == 0
. If this is truly page 0, it should have flags = metaPageFlag
set, and regardless flags == 0
is not one of the valid values:
https://github.com/etcd-io/bbolt/blob/4b8b43e23cceca257d3d2958882dec02d9b16c69/cmd/bbolt/main.go#L1841-L1846
Unfortunately I don't have access to the db file that caused the issue in my case, but if someone else does I would suggest looking at the backup meta on page 2 to see if its correct.
The page was somehow reset, in other words, all content in the page are zero values. FYI. https://github.com/etcd-io/bbolt/pull/520