limdb icon indicating copy to clipboard operation
limdb copied to clipboard

Added optional reverse parameter to all iterators

Open greenersharp opened this issue 1 year ago • 5 comments

In my workflows I routinely need to iterate the database in reverse order.

This change adds a 'reverse' parameter to all iterators.

greenersharp avatar Dec 10 '24 18:12 greenersharp

Thank you for using LimDB, and for your contribution!!!

I like the feature. The API seems fine too. I am thinking a bit the big picture- the decision of whether or not to have iterators with flags in the library defines a new direction the API to grow in, and is therefore very consequential.

I haven't seen any other iterators with flags in stdlib or other libraries. I wonder if there is a good reason for that, or if we should just go forward and be among the first to do it.

I suspect it might just be a question of readability. If that is true, it might be better to have seperate iterators for reverse. On the other and, there might be more relevant flags, and then we would have to have yet more iterators.

What do you think about this? Would it be more readable to have separate iterators for all sorts of different iteration, or would it make more sense to just have a few, but with flags, like you did it?

Thanks again!

capocasa avatar Apr 04 '25 07:04 capocasa

Hi,

Thanks for the response.

I haven't seen any other iterators with flags in stdlib or other libraries.

The LMDB library in python has the ability to move to the last key in a database and then iterate backwards.

https://lmdb.readthedocs.io/en/release/#cursor-class

last()
Move to the last key in the database, returning True on success or False if the database is empty.
iterprev(keys=True, values=True)
Return a reverse iterator that yields the current element before calling [prev()](https://lmdb.readthedocs.io/en/release/#lmdb.Cursor.prev), until the start of the database is reached.

If the cursor is not yet positioned, it is moved to the last key in the database, otherwise iteration proceeds from the current position.

This is really handy when using LMDB to store real-time data logs with lexicographically sortable keys. That way, the last key in the database is always guaranteed to be the most recent entry.

So, let’s say you want to read the most recent 100 records—you simply open the LMDB database, jump to the last key, and iterate backwards.

For the key, I either use ulid: https://github.com/adelq/ulid

or I use epochtime with some formatting to generate lexicographically sortable keys:

import times, strformat, strutils

for x in 1..2000:
    let timestamp = epochTime()
    let formatted = formatFloat(timestamp, ffDecimal, 6)  
    let parts = formatted.split(".")
    let paddedInt = parts[0].align(15, '0')
    let final_key = paddedInt & "." & parts[1]
    echo final_key

Even in a tight loop, you still get lexicographically sortable keys.

I'm not sure if others are using LMDB databases this way, but I've found it be extememly useful.

Thanks

greenersharp avatar Apr 04 '25 09:04 greenersharp

I think you misunderstood! I want the feature, I'm just considering which API would be the best.

Your suggestion

iterator keys*[A, B](t: Transaction[A, B], reverse:bool = false): A

Or something like this

iterator keys*[A, B](t: Transaction[A, B]): A
iterator rkeys*[A, B](t: Transaction[A, B]): A

Or something completely different. I just wanted to take a little time to consider this, because APIs are hard to change. Do you have a suggestion?

Thanks for all the background information, it was very interesting!

capocasa avatar Apr 06 '25 15:04 capocasa

Understood !

Personally, I lean toward the reverse: bool = false approach because it's concise and keeps the surface area of the API smaller.

The internal implementations for forward and reverse are quite similar, so avoiding duplication makes sense to me. That said, if you ever decide to diverge the internal logic more significantly, then having separate iterators would definitely be the way to go.

greenersharp avatar Apr 10 '25 04:04 greenersharp

Okay so I just haven't found the feature in other stdlib data structures, which is what I'm trying to get close to for familiarity (and because I kinda dig them). So I have a strong tendency to go with your suggestion. One thing I'm thinking about for completeness is whether there should be an enum with options (reverse, ...) instead, but I think not- I can't think of an advantage in this case. Reverse is also probably a fine first option even if other options are added later. So if I can't think of anything else soon I think we're good to go.

Note that having a look the stdlib with features in mind I discovered another interesting one limdb should have #8, so thanks for the inspiration!

capocasa avatar Apr 12 '25 05:04 capocasa

Couldn't come up with a better API- there you go!

Thank you very much for your contribution!! ❤️

capocasa avatar Jun 02 '25 16:06 capocasa

I would love to hear about how you are using LimDB as well, if you would like to share!

capocasa avatar Jun 02 '25 16:06 capocasa

Awesome 👍.

Sorry for the late reply, I don't use GitHub very often.

I've been using lmdb with python for many years, and thanks to Limdb, using it in Nim is super easy.

I use lmdb whenever i can. I have seen huge performance benefits in avoiding traditional database reads and instead fetching the data from an lmdb store. This can't always be done of course, but I try to architect my projects to work this way.

Lmdb is in-process, so fetching data doesn't need to context switch and jump across process boundaries.

One of my projects is a recipe app. There was an API function that made a few SQL db calls and returned data about the recipe.

Even with proper indexing, I wasn't happy with the performance.

So I pre-computed and cached all the data and stored it on disk in Lmdb, and cut the SQL database out of the equation.

The performance boost was enormous, even with 100 concurrent reads.

I love lmdb :)

greenersharp avatar Jun 22 '25 04:06 greenersharp