weave
weave copied to clipboard
Clean up store interface
Is your feature request related to a problem? Please describe.
There have been some comments that this could be done better. It was one of the first APIs in weave, and we have learned a lot since then. I'm sure we can improve it.
For example, we now have a much simpler Iterator
type Iterator interface {
Next() (key, value []byte, err error)
Release()
}
Let us consider the following interfaces here:
type ReadOnlyKVStore interface {
Get(key []byte) ([]byte, error)
Has(key []byte) (bool, error)
Iterator(start, end []byte) (Iterator, error)
ReverseIterator(start, end []byte) (Iterator, error)
}
type SetDeleter interface {
Set(key, value []byte) error
Delete(key []byte) error
}
// Batch is used to cache writes before commiting to a lower store
type Batch interface {
SetDeleter
Write() error
}
type KVStore interface {
ReadOnlyKVStore
SetDeleter
NewBatch() Batch
}
type CacheableKVStore interface {
KVStore
CacheWrap() KVCacheWrap
}
type KVCacheWrap interface {
// CacheableKVStore allows us to use this Cache recursively
CacheableKVStore
Write() error
Discard()
}
Describe the solution you'd like
Some thoughts on cleanup:
- Can we remove
NewBatch()
fromKVStore
? What was my thinking there? - Can we simplify/unify
CacheableKVStore
andKVCacheWrap
? - Can we combine/improve
Iterator
andReverseIterator
?
Describe alternatives you've considered
One concrete suggestion is to make iterator take functional parameters
// default -> ascending and no limit
type IterConfig struct {
Descending bool
Limit int
}
type IterOpt func(IterConfig) IterConfig
type ReadOnlyKVStore interface {
Iterate(start, end []byte, opts ...IterOpt) (Iterator, error)
}
// example opt we expose
var Reverse IterOpt = func (cfg IterConfig) IterConfig {
cfg.Descending = true
return cfg
}
Additional context Add any other context or screenshots about the feature request here.
Iterate
method belongs to an interface. Using functional parameters makes the interface unimplementable for outside packages as they cannot consume the parameters.
The reverse parameter of the Iterate
method is a good idea. It makes the interface smaller. I think a reverse bool
type parameter is enough.
type ReadOnlyKVStore interface {
Iterate(start, end []byte, reverse bool) (Iterator, error)
}
Iterator is lazy so the limit parameter is not necessary. You can Release
any time. It can also be implemented as the Iterator
wrapper.
I think another improvement is to return errors.ErrNotFound
instead of nil
. This is not changing the API notation but it greatly change the return values. I think this is a great improvement but also a very expensive one to implement.
Third improvement I can think of is to add context.Context
to all methods. We don't write concurrent code, so the adventage it provides is not that huge. Nevertheless I think it might be worth adding it.
- it will allow to cancel pending operations
- iterator would not need the
Release
method as cancelling can be done via context - detailed metrics can be collected
Thanks @husio I agree with your first two points above.
For the third one, let's think of what this means first, related to https://github.com/iov-one/weave/issues/822:
- We want the full
weave.Context
, which is justcontext.Context
populated with lots of standard data (like block height) - We want a basic
context.Background()
orcontext.WithTimeout(context.Background(), ...)
type info
The first one adds run-time dependencies that we cannot check compile-time and I don't like it.
Just passing in a generic context.Context
with no weave-specific state is a nice idea. It would be good to have a concrete use-case implemented along with the API changes (eg. tracing/profiling)
The reverse parameter of the
Iterate
method is a good idea. It makes the interface smaller. I think areverse bool
type parameter is enough.
I think we don't need the reverse parameter at all. When end
is smaller than start
than we do not way to receive an empty iterator. We want a reversed iterator.
type ReadOnlyKVStore interface {
// Iterate returns an iterator over data set that key is from given range.
// If end value is smaller than start value than the iteration order is reversed.
Iterate(start, end []byte) (Iterator, error)
}
regular := db.Iterate([]byte("alice"), []byte("charlie"))
reversed := db.Iterate([]byte("zorg"), []byte("bob"))
I thought that too, but there is a problem. The ends can often be nil
.
I guess
db.Iterate([]byte("alice"), nil)
means iterate from alice to the end and
db.Iterate(nil, []byte("alice"))
means iterate from beginning to alice.
How do I iterate reverse from end to alice, or alice to beginning?
Also, which order should db.Iterate(nil, nil)
go?
I tried this (remove reverse param) API way back in the tendermint days in early IAVL refactors and realized you couldn't do without. Also, it was nice for usability to always have (low, high) in that order. Adding one argument to remove a method does make sense to me though
How do I iterate reverse from end to alice, or alice to beginning?
Good point.
Not sure if the following is a good design idea. How about two constatns store.MaxKey
and store.MinKey
for those exact cases?
db.Iterate([]byte("alice"), store.MaxKey) \\ iterate from alice to the end and
db.Iterate(store.MinKey, []byte("alice")) \\ iterate from beginning to alice.
Both variables (they cannot be const
:sob:) would be defined as
var (
MinKey = []byte{0}
MaxKey = []byte{256, 256, 256, 256, 256, 256, 256, 256, 256, 256, 256, ...}
)
How about we simplify this by having something like
db.Iterate()
db.IterateReverse()
I don't really like the MinKey, MaxKey approach as it's not user-friendly and only makes sense when you have just talked about it.
I agree with Roman here. Trying to remove an argument or two just obfuscates the API.
I think both
Iterate(start, end []byte)
IterateReverse(start, end []byte)
as well as
Iterate(start, end []byte, reverse bool)
are fine and equivalent. The version with two methods is a bit more work to implement, but much cleaner when reading and using ("what does that , true
parameter mean?").
Agree, both are valid options. The first example is what we have already just different naming, so I like it slightly more.
I was wondering if the end
parameter is the common or a special case of an iterator in our domain. If we do iterate mostly from a start value we can chain another iterator on top that checks if the end was reached and stops.
For example: it:= EndIterator(db.Iterate(), end)
or with a result limit it:= LimitedIterator(db.Iterate(), 10)
.
The benefit is that conditional functionality would go on top of the basic iterator (interface).
Can we remove NewBatch() from KVStore? What was my thinking there?
IMHO it would be better to think about this as a command that uses a KV store rather than having this as an interface method. I would prefer batch := NewNonAtomicBatch(r.KVStore)
or shorter as batch := Batch(r.KVStore)
.
Same argument with CacheableKVStore .CacheWrap()
can become CacheWrap(r.KVStore)
I am not sure if moving the commands to the store
package would create some unwanted dependencies but it would be a nicer read.
Off topic but CacheWrap could be used for atomic batches.
Same argument with CacheableKVStore .CacheWrap() can become CacheWrap(r.KVStore)
This is a great improvement!
For example: it:= EndIterator(db.Iterate(), end) or with a result limit it:= LimitedIterator(db.Iterate(), 10).
Interesting idea. Does this make it simpler? More complex? I had an idea with options to configure the Iterator, but @husio thought that made it harder to use.
I would say that Ascending vs. Descending choice needs to be made in the constructor. Ends, offsets, or limits can be added as wrappers.
batch := NewNonAtomicBatch(r.KVStore) or shorter as batch := Batch(r.KVStore).
I actually did that (including renaming to Batch and removing the Batch interface), and when I was 80% with updating the code, I discovered why I did this.
// NewBatch makes sure all writes go through this one
func (r *cacheableRecordingStore) NewBatch() Batch {
return &recorderBatch{
changes: r.changes,
b: r.CacheableKVStore.NewBatch(),
}
}
I wanted to use custom batch wrapper. I threw away the code (maybe 40 minutes work), but I would first look for a clean design where all implementations of .NewBatch()
are simply the NonAtomicBatch
, then remove that from the interface.
Off topic but CacheWrap could be used for atomic batches.
They are somehow atomic. But we crash in the midst of batch.Write()
, we have no guarantees of the state. The underlying store could commit on every Set operation. That is why I call them NonAtomic.
Also, if you look at the implementation:
func (b *NonAtomicBatch) Write(ctx context.Context) error {
for _, Op := range b.ops {
err := Op.Apply(ctx, b.out)
if err != nil {
return err
}
}
b.ops = nil
return nil
}
If I have 5 operations and the 3rd one errors, then the first two were executed on the underlying store and cannot be reverted. This is fine as used now, but since CacheWrap
uses this NonAtomicBatch
model, we don't gain more atomicity.
The only other approach would be that the backing store explicitly expose
WriteBatch(ops []Op)
next to Set()
and Delete()
and every implementation is able to do any internal checkpointing as needed to make it atomic. That is a huge chunk of work and overhead, so I removed that as I saw no actual need in our code for that, just a theoretical purity point.
We can gladly discuss this point deeper, but at least I explain the current design here.
Back to do-able changes...
Same argument with CacheableKVStore .CacheWrap() can become CacheWrap(r.KVStore)
There are two implementations we use:
func (b BTreeCacheable) CacheWrap() KVCacheWrap {
// TODO: reuse FreeList between multiple cache wraps....
// We create/destroy a lot per tx when processing a block
return NewBTreeCacheWrap(b.KVStore, b.NewBatch(), nil)
}
func (b BTreeCacheWrap) CacheWrap() KVCacheWrap {
// TODO: reuse FreeList between multiple cache wraps....
// We create/destroy a lot per tx when processing a block
return NewBTreeCacheWrap(b, b.NewBatch(), b.free)
}
They are basically the same algorithm, just different arguments, and we can definitely pull this out of the interface into a package-level helper function. Especially once we replace kv.NewBatch()
with NewBatch(kv)
However... currently the interfaces are defined in weave and all implementation in store
.
If we replace NewBatch() and CacheWrap() with concrete functions, they would be in store
, and then require consuming packages to import from store
directly, not just the top-level package.
I think this is no problem in practice, but the more we have packages importing sibling packages, the more risk we have of circular dependencies. (I am fine to do this as discussed, just want some reflection on the import impact first)