distill icon indicating copy to clipboard operation
distill copied to clipboard

Implement an LMDB-compatible in-memory KV store

Open kabergstrom opened this issue 5 years ago • 9 comments

In an effort to minimize the dependency footprint of the project, implementing an in-memory key-value store that implements the LMDB API will allow to make lmdb an optional dependency. It will need to support the transaction semantics of LMDB.

kabergstrom avatar Dec 08 '19 22:12 kabergstrom

We've been interested in this at Embark also, just a simple transactional one with good perf for small but also big (up to 100 MB ish) items and minimal dependencies

repi avatar Jan 31 '20 00:01 repi

Glad to hear! Sounds like we have similar goals here. I'm not working on an implementation for this right now, so if anyone wants to do this, go ahead!

kabergstrom avatar Feb 02 '20 14:02 kabergstrom

I am interested in working on this. My initial plan is to try building a transaction layer on top of HashMaps, then wrap that in an LMDB-like API. I will post here if anything comes up where I need clarification on the requirements.

allen-marshall avatar Feb 02 '20 23:02 allen-marshall

That's great @allen-marshall !

It would be great if we can ensure that certain characteristics are maintained:

  • Write transactions don't block read transactions or vice versa
  • Transactions read a consistent snapshot during the entire lifetime of the transaction
  • Multiple read transactions can be reading different snapshots concurrently
  • Only a single write transaction can be active at a time - writes acquire a mutex

It may be of note that LMDB is implemented as a "B+ tree on disk".

kabergstrom avatar Feb 03 '20 11:02 kabergstrom

Thanks for the clarification. I have started some preliminary work and was wondering if it would be acceptable to omit the ability to have multiple values for the same key in a single database (the DUP_SORT database flag in LMDB). While not infeasible to implement, that feature does add a bit of complexity and doesn't appear to be used anywhere in this project's code currently. I am leaning toward omitting the feature, but feel free to tell me otherwise if LMDB compatibility is more important.

allen-marshall avatar Feb 03 '20 21:02 allen-marshall

I'm OK with omitting DUP_SORT, I don't find it very useful personally. @repi ?

kabergstrom avatar Feb 03 '20 21:02 kabergstrom

Haven’t been using it either

repi avatar Feb 03 '20 22:02 repi

It might make sense to look at the sled. Also

1.0.0 release date will be on January 19, 2021. This is sled's 5th birthday.

lain-dono avatar Oct 22 '20 03:10 lain-dono

I apologize for my prolonged silence on this issue. I did some work on it for around a month after my last post, but after that I was derailed by an (unrelated) mental health crisis. I think I am starting to recover, but it is unclear whether I will be able to resume work on this issue any time soon. It is probably safest to assume I will not.

I believe my changes are here if anyone wants to build on them. The main task that I worked on was trying to translate the lmdb crate's API into a trait-based interface to allow for multiple implementations. It turns out that this is a task where generic associated types would have been really helpful. I think I eventually got the interface translated to a workable extent, but I had to use some very ugly workarounds involving procedural macros and even a little bit of unsafe code, solely to work around the lack of GATs in Rust.

After coming up with a trait-based interface that was as close to elegant as I could manage, I wrote an implementation that wrapped the lmdb crate as well as a very naive in-memory implementation. I started writing code to test and benchmark these implementations but didn't get very far on that task before my health issues got in the way of further progress.

Hopefully this information can give someone a good idea of the current status of my fork if they want to build on it. On the other hand, if someone wants to start from scratch, they can at least be aware that specifying the interface generically may prove harder than expected without GATs.

allen-marshall avatar Oct 23 '20 23:10 allen-marshall