bolt
bolt copied to clipboard
iterate returns duplicate keys
I expect iterating a bucket with cursor return keys in order, but it would return duplicated keys.
Here is the data
And here is the script to reproduce the problem
Here is the output:
key="\x00\x00\x00\x00\x00\x00\xfe\x9d_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
Am I missing something?
The key \x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00
shows up twice during cursor iterating.
I looked into it more. I think actually the db itself is corrupted during page split? (or we failed to reload the nodes correctly)
There are 3 duplicated keys in two adjacent pages:
load index 2 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 1 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 0 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 32 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 31 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 30 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 29 from page 1798
@xiang90 Are you retaining and altering the []byte
that you pass as a key
or value
to Bucket.Put()
?
@benbjohnson No. Each key is put only once without any buffering, so no future access. But I will double check.
@benbjohnson I suspect the split code is the cause
// Split inodes across two nodes.
next.inodes = n.inodes[splitIndex:]
n.inodes = n.inodes[:splitIndex]
We do not copy the backing array. So if we append one element to n.inodes, it will overwrite the first element of next.inodes.
See my example here:
https://play.golang.org/p/xjI9r6rpow
But this is just my guess by reading limited boltdb code. I am not sure.
On the second thought, it might not be the exact cause.
We put keys sequentially. We do not miss any keys, but there are 3 duplicates in between. Basically it looks like:
<------- |dup|-------> 1 2 3 4 5 3 4 5 6 7 8 9
@xiang90 Do you have an example program that writes the data?
@benbjohnson
package playground
import (
"sync"
"github.com/boltdb/bolt"
)
type store struct {
sync.Mutex
revision int
tx *bolt.Tx
db *bolt.DB
}
func (s *store) put(value []byte) {
s.Lock()
defer s.Unlock()
bucket := tx.Bucket("keys")
if bucket == nil {
panic("bucket key does not exist")
}
// it is useful to increase fill percent when the workload is seq append.
// this can delay the page split and reduce space usage.
bucket.FillPercent = 0.9
if err := bucket.Put(intToBytes(s.revision), value); err != nil {
panic(err)
}
s.revision++
}
func (s *store) get(rev int) []byte {
s.Lock()
defer s.Unlock()
bucket := tx.Bucket("keys")
if bucket == nil {
panic("bucket key does not exist")
}
b := bucket.Get(intToBytes(rev))
nb := make([]byte, len(b))
copy(nb, b)
return nb
}
// called every 100ms, every 10000 puts
func (s *store) commit() {
s.Lock()
defer s.Unlock()
if err := s.tx.Commit(); err != nil {
panic(err)
}
var err error
s.tx, err = s.db.Begin(true)
if err != nil {
panic(err)
}
}
It looks pretty much like this.
The put bytes are generated by marshaling a protobuf request, so it will never be accessed again.
The get part is not exactly accurate. we actually unmarshal the data into a protobuf response holding the lock. Protobuf unmarshalling should copy the data.
@xiang90 we occured the same issue, how could you fixed it?