heed
heed copied to clipboard
Support duplicate keys
It doesn't appear that the API supports databases with multiple values for a key (the MDB_DUPSORT flag)
Hey @JayPavlina,
Yeah, I didn’t implemented that because I wasn’t in needs of that feature.
The thing is if we want to support that we will need to create new Database
and PolyDatabase
types that will return an Iterator
on get
and a potentially different Iterator
on iter
(to help the user know when a value belongs to the same key, or not).
I'm thinking of migrating from my own lmdb wrapper to this, and this feature is interesting to me as well. How likely is this to be implemented in the near future?
Hey everyone,
I would be glad if someone could work on that, I can guide and help him/her in the process.
Here is the list of things that must be done to make it a good abstraction.
-
Create one new raw type named
PolyMultiDatabase
akin to thePolyDatabase
type that already exists. The only thing that change is theget
method that will return anRoRangeIter
withkey..=key
(and make sure it returns the duplicate entries for the same key). We will remove lots of methods likefirst
andlast
, we will add those after the first design works. -
Create a new
open_poly_multi_database
method that will open a database with theMDB_DUPSORT
flag and makes sure the db has not already been opened without this flag (is it dangerous this way?). -
BONUS: Once that's good we must create another Iterator type that could ignore duplicate keys, to make that correctly we will probably need to add a method to all of our current Iterators
.skip_dup(skip: bool)
, probably under a new trait, this way we would be able to only create a thinNoDupIter
wrapper that will call that method and continue callingnext
on the wrapped one. It can also call.skip_dup(false)
when the user wants to retrieve the inner Iterator and see duplicates again.
// This Iterator will yield duplicate key values.
let multi_iter: RoIter<_, _> = multi_db.iter()?;
// The previous Iterator is wrapped in another that calls `MDB_NEXT_NODUP` on it
// the wrapper constructor will makes sure to call `skip_dup(true)` beforehand.
let no_dup_iter: NoDupIter<_, _> = multi_iter.skip_duplicates();
// This call unwraps the inner Iterator and calls `skip_dup(false)` before
// returning it, this way the inner Iterator can yield duplicates again.
let multi_iter_again: RoIter<_, _> = no_dup_iter.into_inner();
I would like your thoughts about this API design if possible.
This is a blocker for me as well, though unfortunately I can't commit time to implement it.
e: Actually, I can work around this by just embedding the value data into the key.
this is how I did this. Really just exposed the db flags and added an iterator supporting dup data, simple and works. https://github.com/Kerollmops/heed/pull/59
I advise people who need this feature to use the PR of @stevemk14ebr #59, as I have no time to review it and I know that it can introduce issues when using the crate in a wrong way i.e. normal iterator on a duplicate database or vice-versa.
@Kerollmops To continue our discussion in #145. Below is how I'd approach supporting dupsort DBs. Sorry for the delayed reply. I currently mostly have time during weekends.
It's opinionated, and I'm perfectly fine if you reject it simply because you have a different vision.
I see it as follows:
- Implement dupsort db as a new type, as you suggested above (
PolyMultiDatabase
). Reasoning: There are differences even in basic operations, likeput
(an additional flag MDB_NODUPDATA) anddelete
(can accept value); there are differences in moving between keys/values; and different sets of flags supported (MDB_INTEGERDUP, MDB_REVERSEDUP, MDB_DUPFIXED). Within a single type, it'd be harder to assert all the restrictions or more confusing to a user if there were functions to support both scenarios. - Make different cursors for dup/non-dup DBs. For Multi database, methods that now return key+value (e.g.,
move_on_prev
) would return key+value+nested_cursor. The nested cursor (within-key cursor) would useMDB_NEXT_DUP
/MDB_FIRST_DUP
/etc. - Simplify
PolyDatabase
by removing the majority of helpers, which are mostly proxies to the cursor. E.g.,rev_iter
. Reasoning: The behavior of DBs with and withoutMDB_DUPSORT
is different enough to make the behavior of helpers unintuitive. Instead, add a way to get a cursor. - Add something like
.to_iter()
,.to_rev_iter()
to all cursor types, which would create an iterator starting at the current cursor position with chosen direction. To my mind, it's more flexible than DB-level helpers since you can decide where to start the iterator from. For example, if we have a long list of files, and the end-user scrolls up from the middle, a developer can position a cursor at the current-file-key and use.to_rev_iter()
from it. Another minor advantage, in my opinion - is it's easier to discover.to_iter()
via IDE autocompletion than to find out that you can useRoIter::new
/RoRevIter::new
to get an iterator.
What do you think? After we agree on the overall direction - I can try to bring something more detailed/closer to implementation.
Hello,
About this feature redb (rust lmdb strong inspiration) purpose a MultiTable with it's own types and cursors MultimapRange, along other dedicated types.
If it helps to find a good way that avoid foot-shooting let me know. :)
Have a good day! And thanks for the binding!
EDIT: I am not a lmdb expert, if there is any other problems related to MDB_DUPSORT
it's understandable to don't expose it. ;)
Hello, I'm also interested in this feature. I'll try to implement your suggestion. Have a nice day !
Support has been implemented in #200 🎉