tokio icon indicating copy to clipboard operation
tokio copied to clipboard

Make processing of entrypoints more efficient

Open udoprog opened this issue 2 years ago • 20 comments

This changes processing of entrypoints to operate over token streams directly instead of parsing them (through syn) and drop the dependency on both syn (since it's not used), and quote which can be replaced with a fairly simple internal abstraction (see ToTokens).

Motivation

We do a few things with the current solution which is somewhat "wasteful" in terms of macro processing time:

We feed the item being processed through syn by parsing it as an ItemFn. This has the unintended side effect of parsing the entire function body (including all the code in it) to produce an instance of ItemFn: https://github.com/tokio-rs/tokio/blob/8943e8a/tokio-macros/src/entry.rs#L387. We manipulate the function by producing a token stream which we parse again (this time as a block) causing the entire function body to be parsed another time: https://github.com/tokio-rs/tokio/blob/8943e8a/tokio-macros/src/entry.rs#L355.

This can be quite slow. I've measured a large entrypoint in the wild which takes 200ms to process with the old solution and ~200us with the new one. While this doesn't have a huge impact on full compiles, it appears to be noticeable when used interactively through rust-analyzer (and it obviously stacks up when compiling a lot of things).

Parsing with syn doesn't give us anything that passing the stream back to rustc once it's been manipulated does. rustc even tends to produce better diagnostics for malformed inputs.*

Compare this (error generated and reported with syn):

error: expected identifier, found `::`
  --> examples\chat.rs:71:21
   |
71 |     let addr = env::::args()
   |                     ^^ expected identifier

error: expected identifier
  --> examples\chat.rs:71:21
   |
71 |     let addr = env::::args()
   |                     ^^

error[E0752]: `main` function is not allowed to be `async`
  --> examples\chat.rs:43:1
   |
43 | async fn main() -> Result<(), Box<dyn Error>> {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `main` function is not allowed to be `async`

For more information about this error, try `rustc --explain E0752`.
warning: `examples` (example "chat") generated 2 warnings
error: could not compile `examples` due to 3 previous errors; 2 warnings emitted

With this (error generated by passing the tokens to rustc with the new solution):

error: expected identifier, found `::`
  --> examples\chat.rs:71:21
   |
71 |     let addr = env::::args()
   |                     ^^ expected identifier

Note the extra error: expected identifier in the first variant. This is diagnostics that is produced by syn failing to parse. And since we pass tokens through without processing on errors we get the diagnostics by rustc immediately before it anyways.

In the latter we don't care what syntax the body contains beyond some fairly simple markers we look for, we correctly process the main body and avoid producing the third noisy error stating "main function is not allowed to be async" despite it being so.

Solution

Do our own custom internal processing of macros instead of relying on syn that simply looks for known "markers". Like the function signature is expected to be before a braced ({}) group and perform minimal processing over it to produce the desired entrypoint.

I've also made all diagnostics use the following convention (instead of it being mixed) which causes some ui tests to need updates. Messages are single sentence and all lowercase with no dots An error happened. Supported syntax is: foo. then becomes an error happened, supported syntax is: foo (this follows rustc convention AFAICT).

I've also decided against pre-validating some options that are parsed in the entrypoint. As can be seen in the error we now get in the ui test if a flavor doesn't exist:

error: no such runtime flavor, the runtime flavors are: "current_thread", "multi_thread"
  --> tests/fail/macros_invalid_input.rs:21:24
   |
21 | #[tokio::test(flavor = 123)]
   |                        ^^^

This used to say:

error: Failed to parse value of `flavor` as string.
  --> $DIR/macros_invalid_input.rs:21:24
   |
21 | #[tokio::test(flavor = 123)]
   |                        ^^^

I personally think the new behavior is better, since it informs the user exactly what to change their illegal input to regardless of what it happens to be. So you avoid an extra step of making it into a string and then potentially correcting it.

open questions

What is the purpose of looking for additional #[test] attributes in #[tokio::test]? I didn't implement it yet, and the only test that fails is a UI test. But multiple #[test] attributes are legal from the perspective of rustc - the test will simply run multiple times. Should the old behavior be reproduced?

The code was ported from experimental entrypoints I tinkered with in selectme (which I'm the author of so I'm giving it to Tokio instead). Some rough edges might still be present in the implementation so a review is welcome!

udoprog avatar Feb 17 '22 16:02 udoprog

Nice! cc @taiki-e would love your review of this.

carllerche avatar Feb 17 '22 16:02 carllerche

minrust constraint I didn't foresee. Not a big deal so will fix: https://github.com/tokio-rs/tokio/runs/5235296039?check_suite_focus=true

udoprog avatar Feb 17 '22 16:02 udoprog

To resolve the lint failure in https://github.com/tokio-rs/tokio/runs/5236195058?check_suite_focus=true I believe we'd have to do something similar as in #4176 which requires a bit more effort! I'll take a look at later so marked as draft for now.

The -Z minimal-versions failure is a bit more allusive (https://github.com/tokio-rs/tokio/runs/5236195689?check_suite_focus=true), but I suspect it might have to do with the dependency towards syn / quote constraining some transitive dev-dependency towards proc-macro2 which is used while building since we see this in the log:

Updating proc-macro2 v1.0.36 -> v1.0.0

And this version doesn't support the required minimal. Not sure what to do about this right now!

udoprog avatar Feb 17 '22 17:02 udoprog

Hm. So after pondering the implementation for a bit I have a couple of thoughts.

What is desirable to me would be to perform the following transformation rather than what we're doing right now:

#[tokio::main]
async fn main() <ret> <block>

Into:

fn main() <ret> {
    async fn main() <ret> <block>
    <runtime>.block_on(main())
}

Now this is easy to do and it catches essentially all forms of diagnostics emitted by the rust compiler, it does however cause a regression which was initially documented by @taiki-e in https://github.com/tokio-rs/tokio/pull/3766#issuecomment-835508651 where this solution is also suggested and discarded because it isn't backwards compatible.

Next, we need to handle self in async fn real_f(...), but, we cannot get the actual type of Self from the function signature.

So there's a couple of thoughts I have about this:

  • #[tokio::main] and #[tokio::test] are clearly not intended for this use. The use they are intended for are toplevel main entrypoints and test functions respectively.
  • In order to get good diagnostics this requires us to perform syntactical heuristics, and they will never be able to catch every case (e.g. type aliases, custom types, etc ...).

If I could go back, I'd probably argue for not supporting this use and make any arguments specified on functions with the #[tokio::main] / #[tokio::test] attributes an error in the macro. As an experiment I've done this the prototype I built with selectme (https://github.com/udoprog/selectme/blob/main/selectme-macros/src/entry/output.rs#L156).

So my suggestion would be to consider introducing this as a breaking change. One which does produce a clear error message that other use constitutes a misuse of the attribute. One might even consider it a bug that is patched, and the scope of the breakage seems limited because frankly I've never seen this use outside of test cases in Tokio.

If I missed something important of why such an expansion above wouldn't be appropriate, please tell me!

That means, uses like these would not be permitted (because they have an argument):

trait Foo {
    #[tokio::main]
    async fn foo(self) {
    }
}

#[tokio::main]
async fn foo(v: u32) {
}

Examples of improved diagnostics

The following are a few examples where we currently do not provide great diagnostics with #[tokio::main]. I'll be showing the corresponding diagnostics I get out of selectme side by side which I modified to use the "inner function" expansion that was proposed above.

Non-trivial control flow in blocks

The following is not caught, because the return is inside of a nested expression. We only perform return-level diagnostics at the root level of the function: https://github.com/tokio-rs/tokio/blob/master/tokio-macros/src/entry.rs#L342

We can't catch many other use cases without processing the entirety of the function body, so most forms of control flow simply do not work as intended:

#[tokio::main]
async fn main() -> () {
    if true {
        return Ok(());
    }
}

Tokio today. Note that the tokio output can be improved by adding -> () so that a specific heuristic kicks in, but this is omitted in this example:

error[E0308]: mismatched types
 --> examples\error.rs:3:5
  |
3 | /     if true {
4 | |         return Ok(());
5 | |     }
  | |_____^ expected enum `Result`, found `()`
  |
  = note:   expected type `Result<(), _>`     
          found unit type `()`
note: return type inferred to be `Result<(), _>` here
 --> examples\error.rs:4:16
  |
4 |         return Ok(());
  |                ^^^^^^

error[E0308]: mismatched types
 --> examples\error.rs:3:5
  |
2 |   async fn main() {
  |                   - expected `()` because of default return type
3 | /     if true {
4 | |         return Ok(());
5 | |     }
  | |     ^- help: consider using a semicolon here: `;`
  | |_____|
  |       expected `()`, found enum `Result`
  |
  = note: expected unit type `()`
                  found enum `Result<(), _>`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to 2 previous errors

selectme today:

error[E0308]: mismatched types
 --> examples\error.rs:4:16
  |
4 |         return Ok(());
  |                ^^^^^^ expected `()`, found enum `Result`
  |
  = note: expected unit type `()`
                  found enum `Result<(), _>`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to previous error

Example breaking out of a loop:

#[tokio::main]
async fn main() {
    loop {
        break Ok(());
    }
}

Tokio today:

error[E0308]: mismatched types
 --> examples\error.rs:3:5
  |
2 |   async fn main() {
  |                   - expected `()` because of default return type
3 | /     loop {
4 | |         break Ok(());
5 | |     }
  | |     ^- help: consider using a semicolon here: `;`
  | |_____|
  |       expected `()`, found enum `Result`
  |
  = note: expected unit type `()`
                  found enum `Result<(), _>`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to previous error

selectme:

error[E0308]: mismatched types
 --> examples\error.rs:4:15
  |
4 |         break Ok(());
  |               ^^^^^^ expected `()`, found enum `Result`
  |
  = note: expected unit type `()`
                  found enum `Result<(), _>`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to previous error

Spurious "consider using a semicolon here" suggestions

#[tokio::main]
async fn main() {
    true
}

Tokio today:

error[E0308]: mismatched types
 --> examples\error.rs:3:5
  |
2 | async fn main() {
  |                 - expected `()` because of default return type
3 |     true
  |     ^^^^- help: consider using a semicolon here: `;`
  |     |
  |     expected `()`, found `bool`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to previous error

selectme:

error[E0308]: mismatched types
 --> examples\error.rs:3:5
  |
3 |     true
  |     ^^^^ expected `()`, found `bool`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to previous error

The reason this happens is that the compiler diagnostics considers the fully expanded expression, which probably makes sense to keep (e.g. it might have side effects):

use tokio::runtime::Builder;

fn main() {
    Builder::new().build().unwrap().block_on(async {
        true
    })
}

It's avoided with the other form of expansion, because the compiler gets to consider a normal full function:

use tokio::runtime::{Runtime, Builder};

fn main() {
    async fn main() {
        true
    }

    Builder::new().build().unwrap().block_on(main())
}

udoprog avatar Feb 18 '22 01:02 udoprog

Argument support is a feature that was added intentionally. (see https://github.com/tokio-rs/tokio/issues/1564 and https://github.com/tokio-rs/tokio/pull/1594)

My comment in https://github.com/tokio-rs/tokio/pull/3766 is about the self argument. (And since #3766 I've actually started using self argument in tokio::main)

taiki-e avatar Feb 18 '22 02:02 taiki-e

Thanks for context!

So arguments could be supported with a bit of mangling (the tricky aspect is patterns). self however could not (as you say), so the proposed breakage could be limited to only that.

udoprog avatar Feb 18 '22 02:02 udoprog

Also worth noting that there are some more insidious examples of poor diagnostics today:

Thanks to a fairly confused instance of type inference we've convinced rustc to say both of these things about the span representing the true expression*:

expected `bool`, found enum `Result`
expected enum `Result`, found `bool`

*: This is due to 1) type inference kicks in on the async block which expects its return value to be a Result, and 2) the runtime.block_on expression gets the span of the last expression (in an attempt to improve diagnostics) which evaluates to Result when the function expects a bool.

#[tokio::main]
async fn foo() -> bool {
    if true {
        return Ok(());
    }

    true
}

fn main() {
}
error[E0308]: mismatched types
 --> examples\error.rs:7:5
  |
7 |     true
  |     ^^^^ expected enum `Result`, found `bool`
  |
  = note: expected type `Result<(), _>`
             found type `bool`
note: return type inferred to be `Result<(), _>` here
 --> examples\error.rs:4:16
  |
4 |         return Ok(());
  |                ^^^^^^

error[E0308]: mismatched types
 --> examples\error.rs:7:5
  |
2 | async fn foo() -> bool {
  |                   ---- expected `bool` because of return type
...
7 |     true
  |     ^^^^ expected `bool`, found enum `Result`
  |
  = note: expected type `bool`
             found enum `Result<(), _>`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `selectme` due to 2 previous errors

udoprog avatar Feb 18 '22 02:02 udoprog

  • There also seem to be problems with the function signature handling. (generics, where clause, etc.)

It looks like this will be handled well if None-delimited groups are handled properly, but I think I'm likely still missing an edge case.

taiki-e avatar Feb 18 '22 03:02 taiki-e

Several users are already using the self argument in tokio::main, so I'm not going to accept/approve removing support for self argument.

taiki-e avatar Feb 18 '22 04:02 taiki-e

Several users are already using the self argument in tokio::main, so I'm not going to accept/approve removing support for self argument.

I think a full evaluation demands that it be weighed against the downside of things like poor diagnostics for everyone who is using the attributes a self arguments. And to me that simply isn't as clear cut.

The behavior today is quite poor and has caused me to struggle to understand what went wrong on a few occasions. The only reason I knew to get better diagnostics by moving my function body into my own secondary function is because I was vaguely aware of how the attribute was implemented.

But if this break is not desired it will simply have to wait until Tokio 2.0!

Edit: I think I'll punt that for now anyways into a separate issue. There might be things with typed closures we can use to improve diagnostics while retaining the full ability of today. I haven't investigated it fully yet.

udoprog avatar Feb 18 '22 04:02 udoprog

439841d fixes minimal-versions check by constraining the async-stream dependency to 0.3.2, which uses the correct minimal version of proc-macro2 and has removed use of the API which wasn't supported until proc-macro2 1.0.36 (thanks taiki-e for the release!).

udoprog avatar Feb 18 '22 21:02 udoprog

That should be that. Return type heuristics added in dcc8a7a and the preliminary comments should have been addressed so I'm marking this as ready for review. Note that this does not include a breaking transformation. Or put in other terms, any breakage is unintended (and isn't covered by a test!).

I'll convert https://github.com/tokio-rs/tokio/pull/4513#issuecomment-1043706526 into a separate issue once I get a second to reformat it.

udoprog avatar Feb 18 '22 23:02 udoprog

I have looked this through and it looks overall reasonable.

Darksonn avatar Feb 25 '22 11:02 Darksonn

The minimal versions check has a weird failure in a dependency:

error[E0599]: no method named `base10_parse` found for reference `&LitInt` in the current scope
   --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-attributes-0.1.13/src/lib.rs:704:23
    |
704 |             match lit.base10_parse::<u64>() {
    |                       ^^^^^^^^^^^^ method not found in `&LitInt`

Darksonn avatar Mar 02 '22 12:03 Darksonn

@Darksonn that should be tokio-rs/tracing#1960 - something could probably be done solely in Tokio but I'm not sure what.

udoprog avatar Mar 02 '22 16:03 udoprog

Any updates on this?

Darksonn avatar Mar 23 '22 19:03 Darksonn

No, we just needed to bump tracing to see if it builds and I've been busy. Did it now with a merge so we'll see if it works!

udoprog avatar Mar 24 '22 04:03 udoprog

Hm, another one:

tokio-util v0.7.0 (D:\Repo\tokio\tokio-util)
└── tracing v0.1.32
    ├── cfg-if v1.0.0
    ├── pin-project-lite v0.2.5
    ├── tracing-attributes v0.1.20 (proc-macro)
    │   ├── proc-macro2 v1.0.23 (*)
    │   ├── quote v1.0.0 (*)
    │   └── syn v1.0.43 (*)
    └── tracing-core v0.1.22
        └── lazy_static v1.0.0

See https://github.com/rustwasm/wasm-bindgen/pull/2378

Minimal version of lazy_static needs to be 1.0.2. Somehow I messed up my local cargo hack test last time and missed this.

udoprog avatar Mar 24 '22 04:03 udoprog

What is the status?

Darksonn avatar Apr 06 '22 13:04 Darksonn

PR is in an OK state. It's the minimal versions stuff that is both very time consuming when I sit down with it and I have little motivation to work on.

For now I'm blocked on figuring out how to get all individual crates of tracing to build with minimal-versions (and that sometimes lead me down a tree of broken deps). And lack of tooling is making it difficult. See the associated PR above.

Alternatives or help is welcome!

udoprog avatar Apr 06 '22 22:04 udoprog

Closing in favor of https://github.com/tokio-rs/tokio/pull/5621 (which adopts the approach I mentioned in https://github.com/tokio-rs/tokio/pull/4513#pullrequestreview-886789943) for now. It is more robust and easy to maintain.

taiki-e avatar Apr 15 '23 06:04 taiki-e