wg-async Avoiding async entirely

Avoiding async entirely

Open John-Nagle opened this issue 3 years ago • 16 comments

"Whatever they're using it for, we want all developers to love using Async Rust. " - from the manifesto of this project.

That's a problem. This project is by async enthusiasts, who seem to think that all developers should want to use async. It's a short step from there to require all developers to use async.

Async is really needed only for a specific class of programs - those that are both I/O bound and need to maintain a large number of network connections. Outside of that niche, you don't really need it. We already have threads, after all. Not everyone is writing a web service.

In my case, I'm writing a viewer for a virtual world. It's talking to a GPU, talking to multiple servers, decompressing files, talking to a window, and is compute bound enough to keep 2 to 4 CPUs busy. It will have a most a dozen threads. For this class of problem, threads are essential and async has negative value.

Already, I've dropped the "hyper"/"reqwest" crate and switched to "ureq" because "reqwest" pulls in "tokio", and that, apparently, can no longer be turned off. I'm concerned about async contamination spreading to other crates.

I'm concerned that this project may break Rust as a systems language by over-optimizing it for the software-as-a-service case.

Thanks.

Mar 19 '21 02:03 John-Nagle

@John-Nagle I'm confused by your concerns.

That's a problem. This project is by async enthusiasts, who seem to think that all developers should want to use async. It's a short step from there to require all developers to use async.

For requiring all developers to use async, Rust 2.0 would have to be released, and that's not part of the Rust team's plans.

Async is really needed only for a specific class of programs - those that are both I/O bound and need to maintain a large number of network connections. Outside of that niche, you don't really need it. We already have threads, after all. Not everyone is writing a web service.

I think part of the problem with async right now is that it's targeted towards web services and nothing else. Async is great for all kinds of I/O even if it's a small amount, as shown by JavaScript.

In my case, I'm writing a viewer for a virtual world. It's talking to a GPU, talking to multiple servers, decompressing files, talking to a window, and is compute bound enough to keep 2 to 4 CPUs busy. It will have a most a dozen threads. For this class of problem, threads are essential and async has negative value.

You can always use both threads and async. The only "negative value" I can see from async is the bloat of a large runtime, otherwise it shouldn't have any negative effects. Also, the way your computer talks to the GPU is asynchronous in nature, as is any hardware communication. I feel strongly that code should be written how the hardware works (that's why I prefer systems languages), and not through some abstraction, like a blocking API. I honestly think an asynchronous model is the perfect model for your program.

Already, I've dropped the "hyper"/"reqwest" crate and switched to "ureq" because "reqwest" pulls in "tokio", and that, apparently, can no longer be turned off. I'm concerned about async contamination spreading to other crates.

"hyper"/"request" should not depend on tokio, I agree with that. Libraries should never depend on runtimes, and always leave the runtime choice to the user. I think the "async contamination" is only a problem because of libraries depending on huge runtimes designed for web services. Otherwise, you can just wrap an asynchronous api with a simple general-purpose executor (like the one in "pasts" or this), and essentially avoid async. Even though async's being used under the hood, it doesn't matter because it's abstracted away in a zero-cost manner.

I'm concerned that this project may break Rust as a systems language by over-optimizing it for the software-as-a-service case.

I don't think Rust is ever going to stop being a systems language; There are a lot of use cases for async in embedded development, so I don't think any async advancements would push Rust in that direction.

Mar 19 '21 05:03 AldaronLau

random passerby comment: I am personally pretty far from being an "async" fan, I don't use it and recently started a project with mio as the backbone. however, despite my deep seated skepticism, over the past few years I've come to trust the rust leadership more on this subject, after watching them repeatedly ditch designs that would have encroached on the "what you don't use, you don't pay for" principle. it's true that popular libraries are increasingly using async and that becomes something to deal with for us "outsiders." but I don't think it follows that non-async rust pays any big costs from it. you still have full control over your own program -- you dno't have to hand it over to someone else's runtime.

Mar 19 '21 06:03 jonathanstrong

This was heavily discussed on Hacker News yesterday as well.

Mar 19 '21 12:03 ibraheemdev

Firstly, I think this is a great issue. We should keep in mind that for some workloads, the asynchronous programming paradigm might be a bit of overkill. For instance, if I'm creating a CLI that needs to make one network call, spinning up an entire async runtime versus just opening a blocking socket provided by the operating system, is a bit of an overkill.

That being said, I think there's a bit of conflation between two things:

the futures & async/await programming model
the use of an optimized-for-high-throughput runtime like tokio

Async Rust != Tokio (or async-std, smol, any particular runtime)

This repository is not an attempt to force all users of Rust to use a runtime like tokio. It's about improving the Rust asynchronous programming model for those who wish to use it. Tokio is a part of the picture, and surely an important part for some workloads, but it is not meant to be a requirement for the use of async/await nor for all workloads that doing any I/O.

The creator of hyper/reqwest is on the tokio core team so it's no surprise that it uses tokio under the hood, and while I would personally love to see parts of what are now the tokio ecosystem become more runtime independent (and we will surely have stories based on that in this repo), it's also a decision of the maintainers of hyper/request how they want to implement their library. They certainly don't have to implement a blocking API on top of tokio.

As others have pointed out, your workload actually doesn't sound like a bad fit for an async model, but nothing is or should prevent you from using OS threads as the basic concurrency primitive. It's your choice.

The cost of async Rust

Second, futures and async/await by themselves are basically nothing. Futures are stack allocated by default and generally remain quite small. They don't do anything unless actively polled. So having futures and async/await as an implementation detail of a library does not necessarily mean that that library automatically is more resource intensive than it needs to be. In fact, the reason that Rust has async/await (with the poll based model) is that is the option that assumes the least.

Async vs Threads

The async model assumes absolutely nothing about a threading model. There's no reason to not manually spin up threads and mix this with the use of futures. Sure, some runtimes have particular opinions on threading, but the Rust async model itself does not.

Async Rust != Server Rust

We will certainly have stories about uses for async Rust for embedded devices with low memory and power footprints. These platforms may not even have threads! A poll based async model is essentially the only thing that can be supported by such platforms.

Mar 19 '21 14:03 rylev

Hey @John-Nagle -- I think you're on to something here. I'd like to encourage you to submit a user story about this! I think having some stories that represent users who don't want to think about async is a great idea. On the other hand, if you can't come up with a good story to tell (because async isn't yet impeding on your experience), maybe we can just add some projects that doesn't use async? (e.g., describe your GPU use case). Then we can add a FAQ to the "shiny future" that says "How does this future impact the non-async projects?" so that we ensure we think about and address that question.

I will be up-front, I am thinking a lot about whether async Rust should be "the default" or not when it comes to I/O. I definitely don't think it should be the only option, but I do think that we want to be able to give people a recommended story about how to write code and to maximize interop and the value of the crates.io ecosystem. I think the story is way too confusing right now.

There are real costs to async. You highlighted some, but I'll add some more. Using async fn and .await implies some amount of extra complexity that will take more time to learn, even if we do a good job of sanding the edges off. You have to think about which functions really want to be async and propagate those annotations around. You may encounter system functions or other things that don't work in an async fashion. Your binary has to carry around more of a user-space runtime than it would otherwise.

At the same time, if I am going to make a crate to implement some protocol, or to develop byte-stream adapters for compression or what have you, I need to make a choice now. Sync or async. If we have people start with sync I/O but then they quickly hit limits because the async ecosystem is much larger, that's unfortunate, and the same is true in reverse.

This is precisely why I opened #54, and @BurntSushi opened #49.

In my ideal world, we will be able to tell a convincing shiny future story about how sync-vs-async developers are both well supported and have access to a wide range of interoperable crates. (I note that Zig has an interesting approach here)

Mar 19 '21 14:03 nikomatsakis

One of Rust's strengths is that, at last, after decades, we have a safe threading system for high performance code. That's a huge win on a hard problem. We finally have a good way to use all those CPUs you have today without bugs due to lack of proper locking.

Some problems really need a few CPUs working in coordination. This is standard in game development. PC AAA titles today use all the CPU power available. Usually in C++, with all the problems that implies. Everything in VR and AR needs massive CPU power, and single CPUs are not getting any faster. Rust looked like an exit strategy from the nightmare of multi-threaded C++.

Then came the push for "async" everywhere.

There are so many people now who came up from the Javascript world and know only single-thread "async". They are used to that model and want to use it for everything.

Javascript is pure single thread. (Yes, there are Javascript "web workers".) In Rust you can mix threading and "async". A mix is more complex than either pure threading or pure async, and may lead to hard to find stall bugs. See this painful real world story. That's worth a read. The crates that developer was using slowly pressured him to convert his program to all-async. He didn't really need more than one CPU's worth of compute, so that worked for him. Most programs with concurrency will probably be all thread or all async.

There lies the problem. If crucial crates start to require async, the use of multiple CPUs is slowly choked off by the difficulties of mixing the two models.

Mar 19 '21 17:03 John-Nagle

@John-Nagle Thanks for the reference to the users story! Are there more stories like that you have to share?

Also, what do you think about adding a "non-async product' of a AAA game or something like that? It'd be great if you could talk about what it needs and what potential problems you foresee.

I will say that I don't immediately see the conflict between async and utilizing cores. Most of the various runtimes offer multithreaded runtimes with sophisticated schedulers, and I've thought about extending rayon (for example) to support async. This would permit rayon to support things like arbitrary DAGs of tasks, which it can't do now.

Mar 19 '21 18:03 nikomatsakis

@nikomatsakis As @John-Nagle points out in Hacker News comments, while sync APIs can be built on async APIs by blocking on the root future, the overhead of the underlying runtime (be it tokio or async-std) is non-negligible because the runtime needs to set up polling-based IO. Based on this, I'm thinking if we can come up with a dummy future executor and runtime compatible with the async APIs provided by tokio, but in practice run everything in a blocking fashion using a thread pool. This will solve the problem because this dummy runtime is just a thin wrapper around the standard library IO APIs, and it can effectively "syncify" async programs.

Mar 19 '21 21:03 lqf96

Of course this "dummy" future executor is not for scenarios where concurrency is crucial, but it could reduce the overhead of spinning up and tearing down a epoll/kquque file descriptor just for a few IO operations within a synchronous function call. This may also in theory be more energy efficient because by using block IO instead of polling when the IO load is low, the process can hibernate instead of meaninglessly looping for the next event.

Mar 19 '21 21:03 lqf96

At the same time, if I am going to make a crate to implement some protocol, or to develop byte-stream adapters for compression or what have you, I need to make a choice now. Sync or async.

I'm a big fan of the Sans IO approach championed by the Python community where we build protocol libraries without depending on the specific IO implementation, sync or async. They're implemented mostly as pure functions over state machines which make them easy to test - ideally even randomness and time are abstracted over to support deterministic simulation.

It is then possible to provide both a sync and and async implementation, but the entire endeavor means extra work with no direct language support currently, which leads developers to favour their own use case.

Tokio's Loom project and simulation future plans can also lift a lot of the work required by manually writing Sans IO protocols and testing them. The missing piece is being able to abstract over execution (blocking or not) at compile time, but that's a can of worms which feels out of scope for Rust.

Edit: just saw #49 discussing this

Mar 20 '21 11:03 magnet

Based on this, I'm thinking if we can come up with a dummy future executor and runtime compatible with the async APIs provided by tokio, but in practice run everything in a blocking fashion using a thread pool. This will solve the problem because this dummy runtime is just a thin wrapper around the standard library IO APIs, and it can effectively "syncify" async programs.

The tokio current-thread runtime allows you to do this and will involve the minimum amount of overhead (because there are no context switches). Blocking on a thread-pool will have a higher amount of overhead and will negatively impact performance compared to "just doing blocking IO". And blocking on a threadpool where the threads there use an IO reactor ([e]poll instance) on yet another thread will cause even more overhead.

While the first version is pretty close to "just do a blocking system call" - especially if one would cache the runtime instance - the latter one is pretty far away and a lot less efficient. However people might use it, because it's the approach that is made easy be async libraries which try to be compatible with everything by delegating operations to external threads. Depending on which environment is chosen a write to a socket could involve anything from just staying on a single thread up to hopping between 3 threads.

That demonstrates that there is a spectrum of async usages between "highly efficient" and "rather inefficient", and it's probably inverse to the complexity of usings things and compatiblity between libraries.

For the more efficient ways I definitely think that exposing the async functions in synchronous manner for people who don't care about async is OK. It allows library authors to write their code only once.

But I generally agree with @John-Nagle that not everything should be async, and that for a lot of use-cases it might provide more pain than usefulness. The tricky part is to come up with a general recommendation for when it is useful. For all the projects which want to run a HTTP client as part of a desktop program there is no gain from using async. But if the same library is used inside a server which does 100k RPS it might be important to get the efficiency that is required there. And I don't think we would want to write 2 times the code to satisfy those scenarios. A lightweight blocking wrapper around async code seems ok.

Mar 25 '21 04:03 Matthias247

I think it'd also be worthwhile for the user story to dig a little bit into what is meant by the "overhead" of bringing in tokio/async-std/smol, etc. What is the concern, and why? Longer compile times? More dependencies? Do features help at all here — from what I understand reqwest doesn't bring in the tokio executor, only traits like AsyncRead and AsyncWrite and utilities like tokio::sync, unless you explicitly opt into the blocking feature? And if you do, how do you quantify the resulting overhead?

Mar 25 '21 19:03 jonhoo

When I used reqwest in blocking mode, and had the standard log module enabled, I got log entries such as these:

04:25:04 [TRACE] (1) reqwest::blocking::wait: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/wait.rs:43] (ThreadId(1)) park timeout 29.998282477s
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:341] signal: Want
04:25:04 [TRACE] (2) hyper::proto::h1::conn: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.9/src/proto/h1/conn.rs:650] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:200] poll_want: taker wants!
04:25:04 [TRACE] (2) hyper::client::pool: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.9/src/client/pool.rs:320] put; add idle connection for ("http", api.gridsurvey.com)
04:25:04 [DEBUG] (2) hyper::client::pool: pooling idle connection for ("http", api.gridsurvey.com)
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:341] signal: Want
04:25:04 [TRACE] (2) hyper::proto::h1::conn: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.9/src/proto/h1/conn.rs:650] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }
04:25:04 [TRACE] (1) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:749] closing runtime thread (ThreadId(2))
04:25:04 [TRACE] (1) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:751] signaled close for runtime thread (ThreadId(2))
04:25:04 [TRACE] (2) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:799] (ThreadId(2)) Receiver is shutdown
04:25:04 [TRACE] (2) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:804] (ThreadId(2)) end runtime::block_on
04:25:04 [TRACE] (2) mio::poll: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.22/src/poll.rs:907] deregistering handle with poller
04:25:04 [TRACE] (2) want: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/want-0.3.0/src/lib.rs:330] signal: Closed
04:25:04 [TRACE] (2) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:806] (ThreadId(2)) finished
04:25:04 [TRACE] (1) reqwest::blocking::client: [/home/john/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/blocking/client.rs:753] closed runtime thread (ThreadId(2))

Mar 25 '21 19:03 John-Nagle

The tokio current-thread runtime allows you to do this and will involve the minimum amount of overhead (because there are no context switches). Blocking on a thread-pool will have a higher amount of overhead and will negatively impact performance compared to "just doing blocking IO". And blocking on a threadpool where the threads there use an IO reactor ([e]poll instance) on yet another thread will cause even more overhead.

Whoops I didn't meant to do epoll / kquque for such a hypothetical executor / runtime... It's more like the former case - running a single-thread executor on the main thread and providing a number of "on-the-surface" async APIs. However, these APIs will actually perform blocking IO in a separate thread pool. But I do realized that using a thread pool might not be efficient, and now I think may be a better idea is to just switch to non-blocking IO (e.g. setting O_NONBLOCK) for file descriptors without also applying multiplexed / polling / completion-based IO to them.

However people might use it, because it's the approach that is made easy be async libraries which try to be compatible with everything by delegating operations to external threads.

That demonstrates that there is a spectrum of async usages between "highly efficient" and "rather inefficient", and it's probably inverse to the complexity of usings things and compatiblity between libraries.

For the more efficient ways I definitely think that exposing the async functions in synchronous manner for people who don't care about async is OK. It allows library authors to write their code only once.

I think the hypothetical executor / runtime I'm proposing is just for @John-Nagle 's problem right above: the simplest way for an author to support both sync and async code is to first write async code, and then create sync wrappers which block_on the futures created by the async API in an executor. However, as you see setting up and tearing down such executors can be costly: we need to set up a thread pool, a multiplexing / polling-based IO mechanism, register file descriptors, and just after a few IO operations we undo everything again to tear down the executor. This is an overkill when the user of the sync APIs just want to do something simple and don't care about the performance. Having such a hypothetical "lightweight" executor / runtime means that we can set up and tear down executors instantly in the sync APIs, with minimal overhead, at the cost of lower IO performance, but this is exactly what users want in this use case.

And I don't think we would want to write 2 times the code to satisfy those scenarios. A lightweight blocking wrapper around async code seems ok.

Yes, exactly the reason I'm imagining something like this.

Mar 25 '21 20:03 lqf96

The tokio current-thread runtime allows you to do this and will involve the minimum amount of overhead (because there are no context switches).

Is that really true? There was an article on Hacker News recently where someone benchmarked. The switching cost was better for async, unless the switch was because of an I/O completion. Then it was about the same. Actually, the big win for async was less stack space usage for tasks that don't do much, which matters when you have tens of thousands of threads, but not when you have tens or hundreds.

"Async" is just context switching in user space, after all.

The use case I have is needing higher CPU utilization across multiple CPUs while maintaining reasonably good I/O performance. The async system is designed for the special case of heavy network I/O load coupled with light CPU load. That shouldn't dominate Rust's architecture, even though there are a lot of people making web services.

Mar 25 '21 20:03 John-Nagle

I think @John-Nagle is onto something here.

Consider my case:

I wrote a proxy in DotNet
- It used 380% CPU to keep Node.js busy at 100% CPU
I rewrote that proxy in Rust, thinking it would fix the problem
- Rust/Hyper/Tokio still used 350% CPU to keep Node.js busy at 100% CPU
Both DotNet and Rust/Hyper/Tokio are fixed when restricted to 1 thread only for their async threadpools
- DotNet drops top ~120% CPU
- Rust drops to ~92% CPU

I've documented my experience with the issue here:

https://github.com/pwrdrvr/lambda-dispatch/issues/108
YouTube Video Demonstrating CPU Usage Reduction by Restricting Tokio Runtime to 1 Thread
YouTube Video Demonstrating DotNet and Rust Using Roughly Equivalent Amounts of CPU - 3.5x-3.8x more than Node.js

I, perhaps naively given the amount of research there is into scheduling out there, that this is a fixable problem.

The problem is not limited to my code. The overuse of CPU happens in warp, oha, and has been reported to happen in the Apollo Router too.

Limiting to 1 thread is not a workable solution and Tokio does not currently support increasing and decreasing the worker thread count dynamically.

Is it possible to improve this so that async does not have a hidden penalty that only some will notice?

Feb 04 '24 23:02 huntharo

For what it's worth: I originally upvoted OP, but Let futures be futures successfully convinced me that futures are about more than performance.

Feb 05 '24 15:02 Kinrany

wg-async wg-async copied to clipboard

Avoiding async entirely

Async Rust != Tokio (or async-std, smol, any particular runtime)

The cost of async Rust

Async vs Threads

Async Rust != Server Rust

wg-async
wg-async copied to clipboard