rkv
rkv copied to clipboard
Support lmdb write flags (HOWTO)
Lmdb supports a wide range of write flags to change the default behavior when issuing writes to the store. Currently, rkv::readwrite::Writer passes the default write flag to its put function, which simply overwrites the value if the key is already in the store.
One solution could be just exposing all the write flags from lmdb, and let developers decide which one to use, the upside is apparently it offers a great deal of flexibility, the downside is they will need to know all the store types and its corresponding write flags in lmdb. Misusing them may incur some undesired behaviors, or even worse corrupt the store.
The other way to handle the write flags is abstract them away by providing a few stores instead, with each store has its own semantics on put/get/delete/cursor. Such as:
- Store, just a dumb k/v store as what you'd expect in JS, Rust, or Python
- DupStore, which supports dup keys, aside from those APIs in Store, it might also have mput, mget to insert or get multiple values for the same key
The advantage is that developers do not need to know the underlying details of lmdb, just treat them as some persistent k/v stores. Obviously, they will lose the fine-grained control on the store, and perhaps some performance losses.
Some design decisions need to be made before taking actions to implement it. Given that one of rkv's design goals is to smooth out lmdb's rough edges, I am more inclined to the second plan.
@mykmelez thoughts?
Channeling @rnewman (project originator), I agree with you and would adopt the second approach of creating high-level interfaces to the common types of stores while abstracting away the flexibility (and complexity) of LMDB.
After all, an consumer that needs that flexibility can always use LMDB directly. As you note, one of rkv's design goals is to smooth out LMDB's rough edges, and it's reasonable to trade some functionality for that.
NB: the lmdb crate tries to have this cake and eat it too by exposing its raw pointers to underlying LMDB handles for use cases not satisfied by the crate's own interface. We haven't done that in rkv, and I'm not in a hurry to change that. But it's an option we could consider in the future, if there was functionality that an rkv consumer found essential but which we couldn't yet figure out how to expose in a way that is consistent with rkv's design goals.
Agreed. We used to take the similar approach which exposing both the high level interfaces for the common cases as well as the low level APIs for the best performance.
Specifically, the high level interfaces were analogous to rkv's Reader/Writer, we also offered the DupReader/DupWriter to abstract away those dupsort related options. Another minor difference is that we chose to wrap the transaction into the put/read/delete APIs, i.e. each call on those APIs implies a transaction processing. Obviously, performance (transaction creation overheads) was traded by the ease-of-use, but this also satisfied lmdb's keep-transaction-short requirement.
We also provided a set of low level APIs to achieve the best performance of lmdb, such as:
(env, db, txn) = get_mdb_handle(path, db_name, other_flags)
Consumers can use this triplet to interact with the store for operations like "bulk loading" (MDB_APPEND/APPENDDUP) or other custom manipulations.
From our previous experience, the high level APIs tended to be more approachable to the developers, and they managed to satisfy the performance requirement in most cases. Those low level APIs still served as a nice complement for some mission critical scenarios.
Regarding transactions, @rnewman and I chatted about them a while back, as I was thinking along similar lines of abstracting them away, and he convinced me to continue to explicitly expose them in rkv's otherwise-high-level API.
One reason was performance, which transactions significantly improve when reading/writing multiple values. Another was that they help consumers to understand and reason about the ways that multiple, concurrent readers/writers interact and how LMDB maintains isolation between them (as demonstrated by the tests in env.rs).
And read transactions in particular add very little cognitive overhead, since you don't need to commit()/abort() the transaction, which will be dropped automatically when the reader goes out of scope. When reading single values, you can also open a read transaction and read the value in a reasonably-compact single line of code:
store.read(&env).unwrap().get("foo");
That being said, some consumers may desire (or require) a higher level of abstraction. For the XULStore PoC (attached to bug 1460811), which replaces a JSONFile-based implementation with an rkv-based one, I did exactly that, since the nsIXULStore API isn't transactional.
So I'm inclined to continue to expose transactions, let consumers abstract further as needed, and potentially integrate an additional layer of abstraction in the future only if it seems like consumers are repeatedly reinventing the wheel.
Sounds good to me. We can leave that as a potential optimization if transaction management becomes a burden to the consumers in practice.
That being said, let's focus on how to expose those write flags in a reasonable way for now. For the XULStore PoC, do you think those advanced store types (e.g. dup_sort) will be useful? If not, I believe that we can deprioritize this feature until we identify a use case for it.
The XULStore PoC itself doesn't need dup sort, as all its keys are unique, but I expect there to be other use cases for it (and integer key). I haven't done an exhaustive review, but other components in desktop Firefox that might migrate to LMDB include SessionStore, XPIProvider, ExtensionStorage, nsSearchService, and Blocklist.
And then there are consumers outside of desktop Firefox, including mobile consumers. Overall, it's reasonable to imagine consumers for those advanced store types, although I don't have any specific examples at the moment.