risingwave
risingwave copied to clipboard
perf(storage): reduce calling async method with poll_next in iterator
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
In previous benchmark, we have observed that calling too many async method may cause performance regression even if all types are static. Besides, we observe that in most call on next
, we don't need to call async method and await. In most cases it's just advancing a logical pointer without any necessity to await on anything. Therefore, in this PR, we split the frequently called method next
into two parts, poll_next
and await_next
. The poll_next
will do all the non-async stuffs, and therefore it's not an async method, and the remaining async
stuffs in next
will be done in await_next
. The poll_next
method will return a core::task::poll
enum, where Poll::Ready
means that the next
can be finished without calling await_next
, and Poll::Pending
means that we should further call await_next
to finish the remaining async stuff.
The next
will become
fn next(&mut self) -> Self::NextFuture<'_> {
async {
match self.poll_next() {
Poll::Ready(result) => result,
Poll::Pending => self.await_next().await,
}
}
}
We run the benchmark bench_compactor
and observe great improvement.
Main:
Current branch:
The merge iterator using the normal concat sstable iterator wrapped by union has 35% improvement. The merge iterator using the normal concat sstable iterator has 21% improvement. The merge iterator using the the sstable-prefetched iterator has 5% improvement.
Checklist
- [x] I have written necessary rustdoc comments
- [x] I have added necessary unit tests and integration tests
- [x] All checks passed in
./risedev check
(or alias,./risedev c
)
Refer to a related PR or issue link (optional)
Also the overhead of compactor is not caused by async / await itself. It's caused by the async_trait macro, which converts all Future into Box dyn Future. If we are using GAT everywhere, there should be little performance overhead.
The poll_next
naming seems confusing to the real Future::poll_next
. :(
By the way, if there's no dyn Future
on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.
Codecov Report
Merging #4501 (6cb113f) into main (cc4224d) will increase coverage by
0.00%
. The diff coverage is86.02%
.
@@ Coverage Diff @@
## main #4501 +/- ##
========================================
Coverage 74.56% 74.56%
========================================
Files 848 848
Lines 124164 124332 +168
========================================
+ Hits 92578 92705 +127
- Misses 31586 31627 +41
Flag | Coverage Δ | |
---|---|---|
rust | 74.56% <86.02%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/storage/src/lib.rs | 100.00% <ø> (ø) |
|
src/storage/src/hummock/iterator/mod.rs | 56.66% <57.14%> (-2.03%) |
:arrow_down: |
...e/src/hummock/shared_buffer/shared_buffer_batch.rs | 92.95% <63.63%> (-0.92%) |
:arrow_down: |
...ge/src/hummock/iterator/compact_concat_iterator.rs | 67.70% <71.42%> (-3.18%) |
:arrow_down: |
src/storage/src/hummock/iterator/concat_inner.rs | 90.51% <88.23%> (-1.44%) |
:arrow_down: |
src/storage/src/hummock/iterator/merge_inner.rs | 86.22% <94.28%> (-2.05%) |
:arrow_down: |
...e/src/hummock/sstable/backward_sstable_iterator.rs | 96.77% <95.83%> (-0.85%) |
:arrow_down: |
...ge/src/hummock/sstable/forward_sstable_iterator.rs | 93.54% <97.56%> (+0.48%) |
:arrow_up: |
... and 4 more |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
The
poll_next
naming seems confusing to the realFuture::poll_next
. :(By the way, if there's no
dyn Future
on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.
Yes, I just realized that. This PR is not implementing Future. It just splits the non-async thing out of the original function. I would prefer not naming it as poll_next
, and don't use std::task::Poll
here to avoid confusion.
By the way, if there's no dyn Future on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.
I used to think so. But some benchmark result is really confusing, and it seems that we can get great improvement if we avoid using await. I suspect that async method can cause some regression for compiler code optimization.
I would prefer not naming it as poll_next, and don't use std::task::Poll here to avoid confusion.
Do you have any suggestion on the naming? I have been thinking about it the first day I wrote this PR 🤣
Initially I guess the compiler somehow cannot do LTO without this PR, but I'm wrong. I removed these two lines from Cargo.toml
.
codegen-units = 1
lto = 'thin'
After this PR, with LTO
bench_union_merge_iterator
time: [90.442 ms 90.546 ms 90.663 ms]
Before this PR, with LTO
bench_union_merge_iterator
time: [111.53 ms 111.67 ms 111.84 ms]
Before this PR, w/o LTO
bench_union_merge_iterator
time: [149.62 ms 149.88 ms 150.15 ms]
After this PR, w/o LTO
bench_union_merge_iterator
time: [127.23 ms 127.52 ms 127.81 ms]
So even w/o LTO, we can get performance improvement with this PR.
One thing to notice is that we currently don't enable LTO in release builds. With LTO we can get 149.88 ms -> 111.67 ms = 25% improvement. So LTO can already do a lot, we can enable LTO in release builds to have better performance. For this PR, I would prefer making the code clear to read, to the 15% further improvement.
The
poll_next
naming seems confusing to the realFuture::poll_next
. :( By the way, if there's nodyn Future
on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.Yes, I just realized that. This PR is not implementing Future. It just splits the non-async thing out of the original function. I would prefer not naming it as
poll_next
, and don't usestd::task::Poll
here to avoid confusion.
+1. 🥵
How about use poll_next
in compact
?
For example:
async fn compact(...) {
while iter.is_valid() {
iter.poll_next();
if iter.is_pending() {
iter.await_next().await;
}
}
}
How about use poll_next in compact ? For example:
async fn compact(...) {
while iter.is_valid() {
iter.poll_next();
if iter.is_pending() {
iter.await_next().await;
}
}
}
The next
of iter in this PR is already similar to
async fn next() {
iter.poll_next();
if iter.is_pending() {
iter.await_next().await;
}
}
So the mentioned
async fn compact(...) {
while iter.is_valid() {
iter.poll_next();
if iter.is_pending() {
iter.await_next().await;
}
}
}
will be almost the same as
async fn next() {
while iter.is_valie() {
iter.next().await
}
}
except for one more layer of await called in compact
.
But the previous method would not generate a structure NextFuture
....
I think this structure NextFuture will be much heavy than Poll