bolt icon indicating copy to clipboard operation
bolt copied to clipboard

iterate returns duplicate keys

Open xiang90 opened this issue 8 years ago • 9 comments

I expect iterating a bucket with cursor return keys in order, but it would return duplicated keys.

Here is the data

And here is the script to reproduce the problem

Here is the output:

key="\x00\x00\x00\x00\x00\x00\xfe\x9d_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!

Am I missing something?

The key \x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00 shows up twice during cursor iterating.

xiang90 avatar Dec 16 '16 17:12 xiang90

I looked into it more. I think actually the db itself is corrupted during page split? (or we failed to reload the nodes correctly)

There are 3 duplicated keys in two adjacent pages:

load index 2 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 1 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 0 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 32 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 31 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 30 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 29 from page 1798

xiang90 avatar Dec 16 '16 21:12 xiang90

@xiang90 Are you retaining and altering the []byte that you pass as a key or value to Bucket.Put()?

benbjohnson avatar Dec 16 '16 22:12 benbjohnson

@benbjohnson No. Each key is put only once without any buffering, so no future access. But I will double check.

xiang90 avatar Dec 16 '16 22:12 xiang90

@benbjohnson I suspect the split code is the cause

	// Split inodes across two nodes.
	next.inodes = n.inodes[splitIndex:]
	n.inodes = n.inodes[:splitIndex]

We do not copy the backing array. So if we append one element to n.inodes, it will overwrite the first element of next.inodes.

See my example here:

https://play.golang.org/p/xjI9r6rpow

But this is just my guess by reading limited boltdb code. I am not sure.

xiang90 avatar Dec 16 '16 22:12 xiang90

On the second thought, it might not be the exact cause.

We put keys sequentially. We do not miss any keys, but there are 3 duplicates in between. Basically it looks like:

<------- |dup|-------> 1 2 3 4 5 3 4 5 6 7 8 9

xiang90 avatar Dec 16 '16 23:12 xiang90

@xiang90 Do you have an example program that writes the data?

benbjohnson avatar Dec 17 '16 01:12 benbjohnson

@benbjohnson

package playground

import (
	"sync"

	"github.com/boltdb/bolt"
)

type store struct {
	sync.Mutex
	revision int
	tx       *bolt.Tx
	db       *bolt.DB
}

func (s *store) put(value []byte) {
	s.Lock()
	defer s.Unlock()

	bucket := tx.Bucket("keys")
	if bucket == nil {
		panic("bucket key does not exist")
	}

	// it is useful to increase fill percent when the workload is seq append.
	// this can delay the page split and reduce space usage.
	bucket.FillPercent = 0.9

	if err := bucket.Put(intToBytes(s.revision), value); err != nil {
		panic(err)
	}
	s.revision++
}

func (s *store) get(rev int) []byte {
	s.Lock()
	defer s.Unlock()

	bucket := tx.Bucket("keys")
	if bucket == nil {
		panic("bucket key does not exist")
	}

	b := bucket.Get(intToBytes(rev))
	nb := make([]byte, len(b))
	copy(nb, b)
	return nb
}

// called every 100ms, every 10000 puts
func (s *store) commit() {
	s.Lock()
	defer s.Unlock()

	if err := s.tx.Commit(); err != nil {
		panic(err)
	}

	var err error
	s.tx, err = s.db.Begin(true)
	if err != nil {
		panic(err)
	}
}

It looks pretty much like this.

xiang90 avatar Dec 17 '16 02:12 xiang90

The put bytes are generated by marshaling a protobuf request, so it will never be accessed again.

The get part is not exactly accurate. we actually unmarshal the data into a protobuf response holding the lock. Protobuf unmarshalling should copy the data.

xiang90 avatar Dec 17 '16 03:12 xiang90

@xiang90 we occured the same issue, how could you fixed it?

jsvisa avatar Nov 27 '17 04:11 jsvisa