async-std
async-std copied to clipboard
Performance Regression with async-std > 1.6.2
Hello Everyone,
First of all thanks very much for the great work on async-std. We are making heavy use of this framework in zenoh and have remarked a major performance drop when upgrading from 1.6.2. Whey I say major I mean that our throughput for in some cases is divided by two.
We have identified that the performance issue is introduced on the publishing side and to highlight the huge difference in the cpu time taken by async 1.6.5 vs that taken by 1.6.2 we have made some flames graphs collecting perf data while running our throughput performance test.
The exact command used to collect perf data is included below and the code was compiled in release mode:
$ perf record --call-graph dwarf,16384 -e cpu-clock -F 997 ./target/release/examples/zn_pub_thr 8
The resulting flame graphs are available here for 1.6.2 and here for 1.6.5.
zenoh GitHub depository is https://github.com/eclipse-zenoh/zenoh/tree/rust-master
As you will see from the flame graphs the <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll takes very little time on 1.6.2 and almost 50% of the time on 1.6.5.
I know that there have been changes in the scheduler, maybe we need to change something on our side. In any case any insight will be extremely welcome.
Thanks very much in advance!
Keep the Good Hacking!
Quick question, what kind of CPU configuration are you running this on? on SMP or Ryzen systems 1.6.5 suffers from cach invaludation when moving tasks between different core clusters.
@Licenser thanks very much for the prompt response. The flame-graphs were made on a intel skylake running the latest Ubuntu Linux.
Let me know if you need other info or want us to run some other tests.
One change that comes to mind is that we're no longer inlining TcpStream futures because of https://github.com/async-rs/async-std/pull/889; this was required to fix a critical failure because of a dependency having issued a breaking change in a minor version update. If you're at all testing TCP this may be relevant.
I'm not sure what the right solution here is, but perhaps switching to async-net inside our network types may help resolve this.
@yoshuawuyts that could be the issue as we are heavily using TcpStream. BTW, it would also seem that there is some higher overhead on the ConcurrentQueue::pop. In any case the flame graph reveal that the TCPStream sending side performance have indeed degraded.
Hello everyone, any updates on this issue? We would be happy to help testing systematically async-std performance before each release. We can do that running zenoh on our 10Gbps testbed.
Looking at the flamegraph, it seems that with v1.6.2 everything runs inside the main thread, while at v1.6.5 half of the program is in the main thread and the other half is on executor threads. Am I reading that right?
It would be worth trying to see what happens if the benchmark is run with ASYNC_STD_THREAD_COUNT=1 and if the body of the main function is wrapped in a spawn() like so:
#[async_std::main]
fn main() {
async_std::task::spawn(async {
// code goes here...
})
.await
}
Hello @stjepang, this is what we thought at first, then by looking carefully we spotted that the other thread is doing very little work. What seems to us, is that what used to represent a marginal overhead in 1.6.2 has grown to show as a wider portion of the flame graph.
In any case we'll try running with a single thread and will let you know what that gives. Thanks for the suggestion!