weave Clean up store interface

Is your feature request related to a problem? Please describe.

There have been some comments that this could be done better. It was one of the first APIs in weave, and we have learned a lot since then. I'm sure we can improve it.

For example, we now have a much simpler Iterator

type Iterator interface {
	Next() (key, value []byte, err error)
	Release()
}

Let us consider the following interfaces here:

type ReadOnlyKVStore interface {
	Get(key []byte) ([]byte, error)
	Has(key []byte) (bool, error)
	Iterator(start, end []byte) (Iterator, error)
	ReverseIterator(start, end []byte) (Iterator, error)
}

type SetDeleter interface {
	Set(key, value []byte) error
	Delete(key []byte) error
}

// Batch is used to cache writes before commiting to a lower store
type Batch interface {
	SetDeleter
	Write() error
}

type KVStore interface {
	ReadOnlyKVStore
	SetDeleter
	NewBatch() Batch
}

type CacheableKVStore interface {
	KVStore
	CacheWrap() KVCacheWrap
}

type KVCacheWrap interface {
	// CacheableKVStore allows us to use this Cache recursively
	CacheableKVStore
	Write() error
	Discard()
}

Describe the solution you'd like

Some thoughts on cleanup:

Can we remove NewBatch() from KVStore? What was my thinking there?
Can we simplify/unify CacheableKVStore and KVCacheWrap?
Can we combine/improve Iterator and ReverseIterator?

Describe alternatives you've considered

One concrete suggestion is to make iterator take functional parameters

// default -> ascending and no limit
type IterConfig struct {
  Descending bool
  Limit int
}

type IterOpt func(IterConfig) IterConfig

type ReadOnlyKVStore interface {
    Iterate(start, end []byte, opts ...IterOpt) (Iterator, error)
}

// example opt we expose
var Reverse IterOpt = func (cfg IterConfig) IterConfig {
  cfg.Descending = true
  return cfg
}

Additional context Add any other context or screenshots about the feature request here.

Jun 25 '19 20:06 ethanfrey

Iterate method belongs to an interface. Using functional parameters makes the interface unimplementable for outside packages as they cannot consume the parameters. The reverse parameter of the Iterate method is a good idea. It makes the interface smaller. I think a reverse bool type parameter is enough.

type ReadOnlyKVStore interface {
    Iterate(start, end []byte, reverse bool) (Iterator, error)
}

Iterator is lazy so the limit parameter is not necessary. You can Release any time. It can also be implemented as the Iterator wrapper.

I think another improvement is to return errors.ErrNotFound instead of nil. This is not changing the API notation but it greatly change the return values. I think this is a great improvement but also a very expensive one to implement.

Third improvement I can think of is to add context.Context to all methods. We don't write concurrent code, so the adventage it provides is not that huge. Nevertheless I think it might be worth adding it.

it will allow to cancel pending operations
iterator would not need the Release method as cancelling can be done via context
detailed metrics can be collected

Jun 26 '19 06:06 husio

Thanks @husio I agree with your first two points above.

For the third one, let's think of what this means first, related to https://github.com/iov-one/weave/issues/822:

We want the full weave.Context, which is just context.Context populated with lots of standard data (like block height)
We want a basic context.Background() or context.WithTimeout(context.Background(), ...) type info

The first one adds run-time dependencies that we cannot check compile-time and I don't like it. Just passing in a generic context.Context with no weave-specific state is a nice idea. It would be good to have a concrete use-case implemented along with the API changes (eg. tracing/profiling)

Jun 26 '19 15:06 ethanfrey

The reverse parameter of the Iterate method is a good idea. It makes the interface smaller. I think a reverse bool type parameter is enough.

I think we don't need the reverse parameter at all. When end is smaller than start than we do not way to receive an empty iterator. We want a reversed iterator.

type ReadOnlyKVStore interface {
    // Iterate returns an iterator over data set that key is from given range.
    // If end value is smaller than start value than the iteration order is reversed.
    Iterate(start, end []byte) (Iterator, error)
}

regular := db.Iterate([]byte("alice"), []byte("charlie"))
reversed := db.Iterate([]byte("zorg"), []byte("bob"))

Jun 27 '19 07:06 husio

I thought that too, but there is a problem. The ends can often be nil.

I guess db.Iterate([]byte("alice"), nil) means iterate from alice to the end and db.Iterate(nil, []byte("alice")) means iterate from beginning to alice.

How do I iterate reverse from end to alice, or alice to beginning?

Also, which order should db.Iterate(nil, nil) go?

I tried this (remove reverse param) API way back in the tendermint days in early IAVL refactors and realized you couldn't do without. Also, it was nice for usability to always have (low, high) in that order. Adding one argument to remove a method does make sense to me though

Jun 27 '19 11:06 ethanfrey

How do I iterate reverse from end to alice, or alice to beginning?

Good point. Not sure if the following is a good design idea. How about two constatns store.MaxKey and store.MinKey for those exact cases?

db.Iterate([]byte("alice"), store.MaxKey) \\ iterate from alice to the end and
db.Iterate(store.MinKey, []byte("alice")) \\ iterate from beginning to alice.

Both variables (they cannot be const :sob:) would be defined as

var (
    MinKey = []byte{0}
    MaxKey = []byte{256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, ...}
)

Jun 27 '19 12:06 husio

How about we simplify this by having something like

db.Iterate()
db.IterateReverse()

I don't really like the MinKey, MaxKey approach as it's not user-friendly and only makes sense when you have just talked about it.

Jun 27 '19 13:06 ruseinov

I agree with Roman here. Trying to remove an argument or two just obfuscates the API.

I think both

Iterate(start, end []byte)
IterateReverse(start, end []byte)

as well as

Iterate(start, end []byte, reverse bool)

are fine and equivalent. The version with two methods is a bit more work to implement, but much cleaner when reading and using ("what does that , true parameter mean?").

Jun 27 '19 16:06 ethanfrey

Agree, both are valid options. The first example is what we have already just different naming, so I like it slightly more.

I was wondering if the end parameter is the common or a special case of an iterator in our domain. If we do iterate mostly from a start value we can chain another iterator on top that checks if the end was reached and stops. For example: it:= EndIterator(db.Iterate(), end) or with a result limit it:= LimitedIterator(db.Iterate(), 10).
The benefit is that conditional functionality would go on top of the basic iterator (interface).

Jun 28 '19 10:06 alpe

Can we remove NewBatch() from KVStore? What was my thinking there?

IMHO it would be better to think about this as a command that uses a KV store rather than having this as an interface method. I would prefer batch := NewNonAtomicBatch(r.KVStore) or shorter as batch := Batch(r.KVStore). Same argument with CacheableKVStore .CacheWrap() can become CacheWrap(r.KVStore)

I am not sure if moving the commands to the store package would create some unwanted dependencies but it would be a nicer read.

Off topic but CacheWrap could be used for atomic batches.

Jun 28 '19 11:06 alpe

Same argument with CacheableKVStore .CacheWrap() can become CacheWrap(r.KVStore)

This is a great improvement!

Jun 28 '19 11:06 husio

For example: it:= EndIterator(db.Iterate(), end) or with a result limit it:= LimitedIterator(db.Iterate(), 10).

Interesting idea. Does this make it simpler? More complex? I had an idea with options to configure the Iterator, but @husio thought that made it harder to use.

I would say that Ascending vs. Descending choice needs to be made in the constructor. Ends, offsets, or limits can be added as wrappers.

Jun 28 '19 13:06 ethanfrey

batch := NewNonAtomicBatch(r.KVStore) or shorter as batch := Batch(r.KVStore).

I actually did that (including renaming to Batch and removing the Batch interface), and when I was 80% with updating the code, I discovered why I did this.

// NewBatch makes sure all writes go through this one
func (r *cacheableRecordingStore) NewBatch() Batch {
	return &recorderBatch{
		changes: r.changes,
		b:       r.CacheableKVStore.NewBatch(),
	}
}

I wanted to use custom batch wrapper. I threw away the code (maybe 40 minutes work), but I would first look for a clean design where all implementations of .NewBatch() are simply the NonAtomicBatch, then remove that from the interface.

Jun 28 '19 13:06 ethanfrey

Off topic but CacheWrap could be used for atomic batches.

They are somehow atomic. But we crash in the midst of batch.Write(), we have no guarantees of the state. The underlying store could commit on every Set operation. That is why I call them NonAtomic.

Also, if you look at the implementation:

func (b *NonAtomicBatch) Write(ctx context.Context) error {
	for _, Op := range b.ops {
		err := Op.Apply(ctx, b.out)
		if err != nil {
			return err
		}
	}
	b.ops = nil
	return nil
}

If I have 5 operations and the 3rd one errors, then the first two were executed on the underlying store and cannot be reverted. This is fine as used now, but since CacheWrap uses this NonAtomicBatch model, we don't gain more atomicity.

The only other approach would be that the backing store explicitly expose

WriteBatch(ops []Op) next to Set() and Delete() and every implementation is able to do any internal checkpointing as needed to make it atomic. That is a huge chunk of work and overhead, so I removed that as I saw no actual need in our code for that, just a theoretical purity point.

We can gladly discuss this point deeper, but at least I explain the current design here.

Jun 28 '19 13:06 ethanfrey

Back to do-able changes...

Same argument with CacheableKVStore .CacheWrap() can become CacheWrap(r.KVStore)

There are two implementations we use:

func (b BTreeCacheable) CacheWrap() KVCacheWrap {
	// TODO: reuse FreeList between multiple cache wraps....
	// We create/destroy a lot per tx when processing a block
	return NewBTreeCacheWrap(b.KVStore, b.NewBatch(), nil)
}

func (b BTreeCacheWrap) CacheWrap() KVCacheWrap {
	// TODO: reuse FreeList between multiple cache wraps....
	// We create/destroy a lot per tx when processing a block
	return NewBTreeCacheWrap(b, b.NewBatch(), b.free)
}

They are basically the same algorithm, just different arguments, and we can definitely pull this out of the interface into a package-level helper function. Especially once we replace kv.NewBatch() with NewBatch(kv)

However... currently the interfaces are defined in weave and all implementation in store. If we replace NewBatch() and CacheWrap() with concrete functions, they would be in store, and then require consuming packages to import from store directly, not just the top-level package.

I think this is no problem in practice, but the more we have packages importing sibling packages, the more risk we have of circular dependencies. (I am fine to do this as discussed, just want some reflection on the import impact first)

Jun 28 '19 13:06 ethanfrey

weave weave copied to clipboard

Clean up store interface

weave
weave copied to clipboard