heed Support duplicate keys

Support duplicate keys

Open JayPavlina opened this issue 4 years ago • 6 comments

It doesn't appear that the API supports databases with multiple values for a key (the MDB_DUPSORT flag)

Mar 23 '20 01:03 JayPavlina

Hey @JayPavlina,

Yeah, I didn’t implemented that because I wasn’t in needs of that feature.

The thing is if we want to support that we will need to create new Database and PolyDatabase types that will return an Iterator on get and a potentially different Iterator on iter (to help the user know when a value belongs to the same key, or not).

Mar 23 '20 08:03 Kerollmops

I'm thinking of migrating from my own lmdb wrapper to this, and this feature is interesting to me as well. How likely is this to be implemented in the near future?

Mar 24 '20 02:03 Karrq

Hey everyone,

I would be glad if someone could work on that, I can guide and help him/her in the process.

Here is the list of things that must be done to make it a good abstraction.

Create one new raw type named PolyMultiDatabase akin to the PolyDatabase type that already exists. The only thing that change is the get method that will return an RoRangeIter with key..=key (and make sure it returns the duplicate entries for the same key). We will remove lots of methods like first and last, we will add those after the first design works.
Create a new open_poly_multi_database method that will open a database with the MDB_DUPSORT flag and makes sure the db has not already been opened without this flag (is it dangerous this way?).
BONUS: Once that's good we must create another Iterator type that could ignore duplicate keys, to make that correctly we will probably need to add a method to all of our current Iterators .skip_dup(skip: bool), probably under a new trait, this way we would be able to only create a thin NoDupIter wrapper that will call that method and continue calling next on the wrapped one. It can also call .skip_dup(false) when the user wants to retrieve the inner Iterator and see duplicates again.

// This Iterator will yield duplicate key values.
let multi_iter: RoIter<_, _> = multi_db.iter()?;

// The previous Iterator is wrapped in another that calls `MDB_NEXT_NODUP` on it
// the wrapper constructor will makes sure to call `skip_dup(true)` beforehand.
let no_dup_iter: NoDupIter<_, _> = multi_iter.skip_duplicates();

// This call unwraps the inner Iterator and calls `skip_dup(false)` before
// returning it, this way the inner Iterator can yield duplicates again.
let multi_iter_again: RoIter<_, _> = no_dup_iter.into_inner();

I would like your thoughts about this API design if possible.

Mar 24 '20 09:03 Kerollmops

This is a blocker for me as well, though unfortunately I can't commit time to implement it.

e: Actually, I can work around this by just embedding the value data into the key.

Feb 01 '21 01:02 Ralith

this is how I did this. Really just exposed the db flags and added an iterator supporting dup data, simple and works. https://github.com/Kerollmops/heed/pull/59

Jan 19 '22 19:01 stevemk14ebr

I advise people who need this feature to use the PR of @stevemk14ebr #59, as I have no time to review it and I know that it can introduce issues when using the crate in a wrong way i.e. normal iterator on a duplicate database or vice-versa.

Jan 20 '22 09:01 Kerollmops

@Kerollmops To continue our discussion in #145. Below is how I'd approach supporting dupsort DBs. Sorry for the delayed reply. I currently mostly have time during weekends.

It's opinionated, and I'm perfectly fine if you reject it simply because you have a different vision.

I see it as follows:

Implement dupsort db as a new type, as you suggested above (PolyMultiDatabase). Reasoning: There are differences even in basic operations, like put (an additional flag MDB_NODUPDATA) and delete (can accept value); there are differences in moving between keys/values; and different sets of flags supported (MDB_INTEGERDUP, MDB_REVERSEDUP, MDB_DUPFIXED). Within a single type, it'd be harder to assert all the restrictions or more confusing to a user if there were functions to support both scenarios.
Make different cursors for dup/non-dup DBs. For Multi database, methods that now return key+value (e.g., move_on_prev) would return key+value+nested_cursor. The nested cursor (within-key cursor) would use MDB_NEXT_DUP/MDB_FIRST_DUP/etc.
Simplify PolyDatabase by removing the majority of helpers, which are mostly proxies to the cursor. E.g., rev_iter. Reasoning: The behavior of DBs with and without MDB_DUPSORT is different enough to make the behavior of helpers unintuitive. Instead, add a way to get a cursor.
Add something like .to_iter(), .to_rev_iter() to all cursor types, which would create an iterator starting at the current cursor position with chosen direction. To my mind, it's more flexible than DB-level helpers since you can decide where to start the iterator from. For example, if we have a long list of files, and the end-user scrolls up from the middle, a developer can position a cursor at the current-file-key and use .to_rev_iter() from it. Another minor advantage, in my opinion - is it's easier to discover .to_iter() via IDE autocompletion than to find out that you can use RoIter::new/RoRevIter::new to get an iterator.

What do you think? After we agree on the overall direction - I can try to bring something more detailed/closer to implementation.

Jan 22 '23 11:01 shedar

Hello,

About this feature redb (rust lmdb strong inspiration) purpose a MultiTable with it's own types and cursors MultimapRange, along other dedicated types.

If it helps to find a good way that avoid foot-shooting let me know. :)

Have a good day! And thanks for the binding!

EDIT: I am not a lmdb expert, if there is any other problems related to MDB_DUPSORT it's understandable to don't expose it. ;)

Jun 22 '23 10:06 darnuria

Hello, I'm also interested in this feature. I'll try to implement your suggestion. Have a nice day !

Jul 07 '23 15:07 AureliaDolo

Support has been implemented in #200 🎉

Nov 26 '23 13:11 Kerollmops

heed heed copied to clipboard

Support duplicate keys

heed
heed copied to clipboard