reth
reth copied to clipboard
Use `cursor<T>.insert` instead of `put<T>` wherever possible
Describe the feature
Looking at https://github.com/paradigmxyz/reth/pull/1130#issuecomment-1418642755 we can see that there's a speed advantage on using cursor<T>.insert to put<T> when dealing with many values.
We probably should keep that in mind and change all loops which use put to insert/upsert, where possible.
Additional context
No response
Interesting!!
Logically, I thought put<T> might be faster than cursor<T>.insert because cursor<T>.insert has to conduct same thing with put<t> and then addtionally it should update the current position to the new item. The additional cost might be negligible. But because it's executed bunch of times, I think we need to optimize it most.
Practcally, with benches, you proved that put<T> is slower than cursor<T>.insert.
But I'm still thinking that put<T> might be faster than cursor<T>.insert in terms of mdbx. I suspect it's caused that our implementation of put<T> opens a db whenever it's executed. I think if put<T> could re-use a db instance which is already opened then put<T> could be slightly faster than cursor<T>.insert.
I'll test my idea and share the result. Please note that I might be totally wrong because I'm just a newbie for reth. It's just newbie's curious and thought. :)
https://github.com/paradigmxyz/reth/blob/92ff3f961dd381ccb3b3dda144e84036fd7de5d4/crates/storage/db/src/implementation/mdbx/tx.rs#L91-L100
All good thoughts. I think we should use our benchmarks to guide any change we make in the low level db accessors.
I thought put<T> might be faster than cursor<T>.insert because cursor<T>.insert has to conduct same thing with put
and then addtionally it should update the current position to the new item.
HaHa, I was totally wrong! :)
After simple reviewing mdbx itself, I've realized it's totally vice versa. put<T> is working as initializing cursor first and then doing same thing with cursor<T>.insert. So, also logically, cursor<T>.insert might be faster than put<T> without bench.
But, I'm still thinking it's not good that put<T> opens db for every execution. I'll investigate and learn more for it.
extract from mdbx doc:
A single transaction can open multiple databases. Generally databases should only be opened once, by the first transaction in the process.
ref: https://erthink.github.io/libmdbx/usage.html, Getting started section.
@joshieDo is this still relevant?
Yes it is, there are still some places where it can be used (eg. hashing storage stage)
This issue is stale because it has been open for 21 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.