gkvlite icon indicating copy to clipboard operation
gkvlite copied to clipboard

Use Varints for persistent serialization format to unlimit the key/value lengths.

Open riannucci opened this issue 9 years ago • 4 comments

What would you think about using Varints (http://golang.org/pkg/encoding/binary/) to encode all numerical values in gkvlite's datafile?

Advantages:

  • would allow support for keys and values up to 2**63-2 without bloating databases which don't use this functionality (2**63-2 because the header needs to encode len(key)+len(value)+len(priority)).
    • [though if the priority were encoded as a fixed size, it's size could be implied].
  • would save a bunch of bytes / item for databases with many small records

Disadvantages:

  • would change the binary format of the database, so old DB files wouldn't be loadable.
  • would change the StoreCallbacks.ItemAlloc interface
  • could add up-to ~8 bytes per record for databases with large key or value records (e.g. records over 268,435,455 bytes would take more that 4 bytes to encode the length header), or with large Priority values (over the same quantity). From examining the code it looks like there are 4 of these integer quantities per item, and MaxVarintLen64 is 10 bytes (2 more than the fixed width encoding).
  • variable-size integers are a bit trickier to deal with in the data file format (though looking at the code I don't think it would be too hard).

I also realize that this project hasn't really changed much in 2 years, so maybe this question will be unseen :). However, would you be interested in a pull request that would do this? I think the biggest hurdle would be the datafile incompatibility. If that's a hard-stop for accepting the pull request, then I probably wouldn't work on it for quite a while.

For now, I've forked and simply bumped the key size up to 32 bits (luci/gkvlite@cf7fa95b097418138e80474e217d865e2eb853ac) (I needed up-to 2MB keys :/).

riannucci avatar May 11 '15 20:05 riannucci

Hi - an interesting idea.

Not generally opposed to this, but a few more questions/ideas to riff off yours....

Especially, there's also a persisted version number, so that one possibility is to just check that and keep the old read/write() methods around for backwards compat. Might not be worth the backwards compat effort, though.

One thing I'm unclear about is if you'd just want to bump the key/val length types up to uint32/uint64, or uint64/uint64, or actually use Varint/Varint for even the in-memory representations. I'd prefer the first, if your app needs could be met that way.

btw, if you don't mind me asking, wondering what kind of app has keys that might be up 2**63-ish long? That's a long key!

Finally, I don't think there's that many users of gkvlite, but probably throwing a big "major version number" increment, a'la semantic versioning, would also be needed if backwards compat wasn't supported.

steveyen avatar May 11 '15 20:05 steveyen

Ah, yeah after I posted this question, I noticed the VERSION const. I think bumping it would be definitely required, not sure about keeping both sets of read/write logic around. If I did that I'd basically have 2 sets of everything, which sounds like pretty high maintenance overhead (and since gkvlite seems to be pretty quiet feature-wise, it would probably be best for downstream users to just pin the version of the library they're using anyway :)).

I was only planning on changing the serialized format to use varints; in-memory objects would keep normal {,u}int{32,64} types. Keeping varints around in memory is a hassle (especially without the possibility of operator overloading).

Oh, yeah, I don't actually want 2**63 long keys (which would be completely insane) :D. I specifically need 2MB keys (which is also pretty huge), to conform to this (https://cloud.google.com/appengine/docs/go/datastore/#Go_Quotas_and_limits, see the bottom limit re: composite index entry size). I'm writing a go-appengine testing framework, and I'd like to use gkvlite to emulate the datastore's semantics (particularly; gkvlite is native-go and supports multiple goroutine-safe Snapshots and allows database mutations without invalidating the snapshots, unlike, say, boltdb which uses a reader/writer lock over the entire database, or leveldb which has the right semantics but isn't native-go).

I was also just thinking that varints would be a handy way to improve storage size for gkvlite (on-average) at the same time as effectively unlimiting the size of the key/vals that it contains :)

riannucci avatar May 11 '15 22:05 riannucci

Cool, makes sense.

In memory of (u)int{32,64} for key and val lengths makes sense to me instead of in-memory varints.

Varints for persistence is also ok by me, even if incompatible.

steveyen avatar May 11 '15 23:05 steveyen

(Just so you know, I do plan on actually doing this... I just got pulled into a couple other directions in the meantime :disappointed:)

riannucci avatar May 21 '15 03:05 riannucci