reth
reth copied to clipboard
Use `cursor<T>.insert` instead of `put<T>` wherever possible
Describe the feature
Looking at https://github.com/paradigmxyz/reth/pull/1130#issuecomment-1418642755 we can see that there's a speed advantage on using cursor<T>.insert
to put<T>
when dealing with many values.
We probably should keep that in mind and change all loops which use put
to insert
/upsert
, where possible.
Additional context
No response
Interesting!!
Logically, I thought put<T>
might be faster than cursor<T>.insert
because cursor<T>.insert
has to conduct same thing with put<t>
and then addtionally it should update the current position to the new item. The additional cost might be negligible. But because it's executed bunch of times, I think we need to optimize it most.
Practcally, with benches, you proved that put<T>
is slower than cursor<T>.insert
.
But I'm still thinking that put<T>
might be faster than cursor<T>.insert
in terms of mdbx
. I suspect it's caused that our implementation of put<T>
opens a db
whenever it's executed. I think if put<T>
could re-use a db instance which is already opened then put<T>
could be slightly faster than cursor<T>.insert
.
I'll test my idea and share the result. Please note that I might be totally wrong because I'm just a newbie for reth. It's just newbie's curious and thought. :)
https://github.com/paradigmxyz/reth/blob/92ff3f961dd381ccb3b3dda144e84036fd7de5d4/crates/storage/db/src/implementation/mdbx/tx.rs#L91-L100
All good thoughts. I think we should use our benchmarks to guide any change we make in the low level db accessors.
I thought put<T> might be faster than cursor<T>.insert because cursor<T>.insert has to conduct same thing with put
and then addtionally it should update the current position to the new item.
HaHa, I was totally wrong! :)
After simple reviewing mdbx
itself, I've realized it's totally vice versa. put<T>
is working as initializing cursor
first and then doing same thing with cursor<T>.insert
. So, also logically, cursor<T>.insert
might be faster than put<T>
without bench.
But, I'm still thinking it's not good that put<T>
opens db
for every execution. I'll investigate and learn more for it.
extract from mdbx
doc:
A single transaction can open multiple databases. Generally databases should only be opened once, by the first transaction in the process.
ref: https://erthink.github.io/libmdbx/usage.html, Getting started
section.
@joshieDo is this still relevant?
Yes it is, there are still some places where it can be used (eg. hashing storage stage)
This issue is stale because it has been open for 21 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.