I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

In previous benchmark, we have observed that calling too many async method may cause performance regression even if all types are static. Besides, we observe that in most call on next, we don't need to call async method and await. In most cases it's just advancing a logical pointer without any necessity to await on anything. Therefore, in this PR, we split the frequently called method next into two parts, poll_next and await_next. The poll_next will do all the non-async stuffs, and therefore it's not an async method, and the remaining async stuffs in next will be done in await_next. The poll_next method will return a core::task::poll enum, where Poll::Ready means that the next can be finished without calling await_next, and Poll::Pending means that we should further call await_next to finish the remaining async stuff.

The next will become

fn next(&mut self) -> Self::NextFuture<'_> {
        async {
            match self.poll_next() {
                Poll::Ready(result) => result,
                Poll::Pending => self.await_next().await,
            }
        }
    }

We run the benchmark bench_compactor and observe great improvement. Main:

Current branch:

The merge iterator using the normal concat sstable iterator wrapped by union has 35% improvement. The merge iterator using the normal concat sstable iterator has 21% improvement. The merge iterator using the the sstable-prefetched iterator has 5% improvement.

Checklist

[x] I have written necessary rustdoc comments
[x] I have added necessary unit tests and integration tests
[x] All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Aug 08 '22 12:08 wenym1

Also the overhead of compactor is not caused by async / await itself. It's caused by the async_trait macro, which converts all Future into Box dyn Future. If we are using GAT everywhere, there should be little performance overhead.

Aug 08 '22 12:08 skyzh

The poll_next naming seems confusing to the real Future::poll_next. :(

By the way, if there's no dyn Future on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.

Aug 08 '22 12:08 BugenZhao

Codecov Report

Merging #4501 (6cb113f) into main (cc4224d) will increase coverage by 0.00%. The diff coverage is 86.02%.

@@           Coverage Diff            @@
##             main    #4501    +/-   ##
========================================
  Coverage   74.56%   74.56%            
========================================
  Files         848      848            
  Lines      124164   124332   +168     
========================================
+ Hits        92578    92705   +127     
- Misses      31586    31627    +41

Flag	Coverage Δ
rust	`74.56% <86.02%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/storage/src/lib.rs	`100.00% <ø> (ø)`
src/storage/src/hummock/iterator/mod.rs	`56.66% <57.14%> (-2.03%)`	:arrow_down:
...e/src/hummock/shared_buffer/shared_buffer_batch.rs	`92.95% <63.63%> (-0.92%)`	:arrow_down:
...ge/src/hummock/iterator/compact_concat_iterator.rs	`67.70% <71.42%> (-3.18%)`	:arrow_down:
src/storage/src/hummock/iterator/concat_inner.rs	`90.51% <88.23%> (-1.44%)`	:arrow_down:
src/storage/src/hummock/iterator/merge_inner.rs	`86.22% <94.28%> (-2.05%)`	:arrow_down:
...e/src/hummock/sstable/backward_sstable_iterator.rs	`96.77% <95.83%> (-0.85%)`	:arrow_down:
...ge/src/hummock/sstable/forward_sstable_iterator.rs	`93.54% <97.56%> (+0.48%)`	:arrow_up:
... and 4 more

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Aug 08 '22 12:08 codecov[bot]

The poll_next naming seems confusing to the real Future::poll_next. :(

By the way, if there's no dyn Future on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.

Yes, I just realized that. This PR is not implementing Future. It just splits the non-async thing out of the original function. I would prefer not naming it as poll_next, and don't use std::task::Poll here to avoid confusion.

Aug 08 '22 12:08 skyzh

By the way, if there's no dyn Future on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.

I used to think so. But some benchmark result is really confusing, and it seems that we can get great improvement if we avoid using await. I suspect that async method can cause some regression for compiler code optimization.

Aug 08 '22 12:08 wenym1

I would prefer not naming it as poll_next, and don't use std::task::Poll here to avoid confusion.

Do you have any suggestion on the naming? I have been thinking about it the first day I wrote this PR 🤣

Aug 08 '22 12:08 wenym1

Initially I guess the compiler somehow cannot do LTO without this PR, but I'm wrong. I removed these two lines from Cargo.toml.

codegen-units = 1
lto = 'thin'

After this PR, with LTO

bench_union_merge_iterator                                                                            
                        time:   [90.442 ms 90.546 ms 90.663 ms]

Before this PR, with LTO

bench_union_merge_iterator                                                                            
                        time:   [111.53 ms 111.67 ms 111.84 ms]

Before this PR, w/o LTO

bench_union_merge_iterator                                                                            
                        time:   [149.62 ms 149.88 ms 150.15 ms]

After this PR, w/o LTO

bench_union_merge_iterator                                                                            
                        time:   [127.23 ms 127.52 ms 127.81 ms]

So even w/o LTO, we can get performance improvement with this PR.

One thing to notice is that we currently don't enable LTO in release builds. With LTO we can get 149.88 ms -> 111.67 ms = 25% improvement. So LTO can already do a lot, we can enable LTO in release builds to have better performance. For this PR, I would prefer making the code clear to read, to the 15% further improvement.

Aug 08 '22 12:08 skyzh

The poll_next naming seems confusing to the real Future::poll_next. :( By the way, if there's no dyn Future on the path, I think the compiler is able to realize that and the generated state machine should be the same as calling a sync function.

Yes, I just realized that. This PR is not implementing Future. It just splits the non-async thing out of the original function. I would prefer not naming it as poll_next, and don't use std::task::Poll here to avoid confusion.

+1. 🥵

Aug 09 '22 06:08 BugenZhao

How about use poll_next in compact ? For example:

async fn compact(...) {
    while iter.is_valid() {
        iter.poll_next();
        if iter.is_pending() {
            iter.await_next().await;
        }
    }
}

Aug 10 '22 06:08 Little-Wallace

How about use poll_next in compact ? For example:

async fn compact(...) {
    while iter.is_valid() {
        iter.poll_next();
        if iter.is_pending() {
            iter.await_next().await;
        }
    }
}

The next of iter in this PR is already similar to

async fn next() {
    iter.poll_next();
    if iter.is_pending() {
        iter.await_next().await;
    }
}

So the mentioned

async fn compact(...) {
    while iter.is_valid() {
        iter.poll_next();
        if iter.is_pending() {
            iter.await_next().await;
        }
    }
}

will be almost the same as

async fn next() {
    while iter.is_valie() {
        iter.next().await
    }
}

except for one more layer of await called in compact.

Aug 10 '22 09:08 wenym1

But the previous method would not generate a structure NextFuture.... I think this structure NextFuture will be much heavy than Poll

Aug 10 '22 11:08 Little-Wallace

risingwave
risingwave copied to clipboard

perf(storage): reduce calling async method with poll_next in iterator

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

Codecov Report

risingwave risingwave copied to clipboard

perf(storage): reduce calling async method with poll_next in iterator

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

Codecov Report

risingwave
risingwave copied to clipboard