reth icon indicating copy to clipboard operation
reth copied to clipboard

Use `cursor<T>.insert` instead of `put<T>` wherever possible

Open joshieDo opened this issue 2 years ago • 6 comments

Describe the feature

Looking at https://github.com/paradigmxyz/reth/pull/1130#issuecomment-1418642755 we can see that there's a speed advantage on using cursor<T>.insert to put<T> when dealing with many values.

We probably should keep that in mind and change all loops which use put to insert/upsert, where possible.

Additional context

No response

joshieDo avatar Feb 08 '23 05:02 joshieDo

Interesting!!

Logically, I thought put<T> might be faster than cursor<T>.insert because cursor<T>.insert has to conduct same thing with put<t> and then addtionally it should update the current position to the new item. The additional cost might be negligible. But because it's executed bunch of times, I think we need to optimize it most.

Practcally, with benches, you proved that put<T> is slower than cursor<T>.insert.

But I'm still thinking that put<T> might be faster than cursor<T>.insert in terms of mdbx. I suspect it's caused that our implementation of put<T> opens a db whenever it's executed. I think if put<T> could re-use a db instance which is already opened then put<T> could be slightly faster than cursor<T>.insert.

I'll test my idea and share the result. Please note that I might be totally wrong because I'm just a newbie for reth. It's just newbie's curious and thought. :)

https://github.com/paradigmxyz/reth/blob/92ff3f961dd381ccb3b3dda144e84036fd7de5d4/crates/storage/db/src/implementation/mdbx/tx.rs#L91-L100

jinsankim avatar Feb 09 '23 10:02 jinsankim

All good thoughts. I think we should use our benchmarks to guide any change we make in the low level db accessors.

gakonst avatar Feb 09 '23 21:02 gakonst

I thought put<T> might be faster than cursor<T>.insert because cursor<T>.insert has to conduct same thing with put and then addtionally it should update the current position to the new item.

HaHa, I was totally wrong! :)

After simple reviewing mdbx itself, I've realized it's totally vice versa. put<T> is working as initializing cursor first and then doing same thing with cursor<T>.insert. So, also logically, cursor<T>.insert might be faster than put<T> without bench.

But, I'm still thinking it's not good that put<T> opens db for every execution. I'll investigate and learn more for it.

jinsankim avatar Feb 13 '23 02:02 jinsankim

extract from mdbx doc:

A single transaction can open multiple databases. Generally databases should only be opened once, by the first transaction in the process.

ref: https://erthink.github.io/libmdbx/usage.html, Getting started section.

jinsankim avatar Feb 13 '23 06:02 jinsankim

@joshieDo is this still relevant?

mattsse avatar Jun 15 '23 15:06 mattsse

Yes it is, there are still some places where it can be used (eg. hashing storage stage)

joshieDo avatar Jun 15 '23 18:06 joshieDo

This issue is stale because it has been open for 21 days with no activity.

github-actions[bot] avatar Sep 14 '23 01:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Sep 21 '23 01:09 github-actions[bot]