rust-analyzer
rust-analyzer copied to clipboard
Macro-expanding large `vec!` with numbers is very slow (`georust/geo`)
https://github.com/georust/geo/, tested at d606fc5f01a6927f5e6033b09df7d404155b5637
[ERROR project_model::workspace] cyclic deps: jts_test_runner(CrateId(281)) -> geo(CrateId(209)), alternative path: geo(CrateId(209)) -> jts_test_runner(CrateId(281))
Database loaded: 863.07ms, 219minstr (metadata 405.64ms, 15minstr; build 388.92ms, 5863kinstr)
crates: 20, mods: 175, decls: 1971, fns: 1470
Item Collection: 5.87s, 57ginstr
exprs: 480353, ??ty: 93 (0%), ?ty: 3 (0%), !ty: 0
Inference: 248.09s, 2694ginstr
Total: 253.96s, 2751ginstr
Possibly related to the large test fixtures that are include!ed (geo/src/algorithm/test_fixtures/norway_main.rs).
EDIT: I think it was related to the include!. Make sure to test on that commit, the code was removed in https://github.com/georust/geo/pull/703.
Repro: hello.zip
cargo check finishes in 650 ms, RA takes 60 seconds.
17.66% rust-analyzer rust-analyzer [.] mbe::to_parser_input::to_parser_input
11.57% rust-analyzer rust-analyzer [.] tt::buffer::Cursor::eof
7.10% rust-analyzer rust-analyzer [.] rustc_ap_rustc_lexer::<impl rustc_ap_rustc_lexer::cursor::Cursor>::eat_decimal_digits
6.09% rust-analyzer rust-analyzer [.] rustc_ap_rustc_lexer::<impl rustc_ap_rustc_lexer::cursor::Cursor>::advance_token
4.57% rust-analyzer rust-analyzer [.] tt::buffer::Cursor::bump
4.51% rust-analyzer rust-analyzer [.] <&[tt::TokenTree] as tt::buffer::TokenList>::entries
4.06% rust-analyzer libc.so.6 [.] malloc
3.98% rust-analyzer rust-analyzer [.] tt::buffer::TokenBuffer::new_inner
3.40% rust-analyzer rust-analyzer [.] parser::lexed_str::Converter::push
3.36% rust-analyzer rust-analyzer [.] tt::buffer::Cursor::token_tree
Replacing every number with 11 speeds it up slightly (52 s), replacing the resulting [11, 11] with [1] yields 37 s (expected, since there are half as many tokens), then replacing [1] with 1 brings it down to 19 s.
EDIT: adding some random #[inline] annotations around the TokenBuffer and Entry methods speeds it up by 23%, but only makes a small (2 s) difference on self.
I tried to figure out why rust-analyzer is slow on this input, and I think I found the issue, but I don't know how to fix it without rewriting a lot of code, or without ugly hacks. So I'll describe the problem here, and maybe somebody could suggest a good fix.
So currently parsing a macro like vec![1,1,1,1,1,...,1] works in O(n^2), which is too slow if you have tens of thousands of elements there.
We can start from the Matcher::match_loop, which receives tt:Subtree with token_trees of size n. On our input, this function calls match_loop_inner n times, each of the calls eats one element from the vector, and works in O(n). If match_loop_inner would work in a time proportional to the size of consumed tokens, it would solve the problem, but in fact, it is slower.
match_loop_inner inside calls match_meta_var, which inside calls Tt_iter::expect_fragment, which is a main problem.
expect_fragment converts the whole input (which has n tokens) into TokenBuffer, and then into Parser::Input. Both steps work in O(n). Then entry_point.parse parses a very small prefix of the generated output, and returns. If TokenBuffer::from_tokens and to_parse_input could work lazily and only take time proportional to what is consumed later by .parse it could solve the problem. But I don't know the easy way of doing it.
Another option is to construct TokenBuffer and Parser::Input once from Tt_iter and then pass it on every match_loop_inner call, but that seems like a too big change from how the code is currently written (and we will need to remember the current position in Parser::Input between match_loop_inner calls, which seems complicated).
Another way, which doesn't require a lot of changes in the code, but which I don't like, looks like this. Inside expect_fragment instead of converting the whole sequence into TokenBuffer and Parser::Input, we can convert a short prefix of it, and try to parse it. If the parsing function finishes successfully without consuming the whole converted input, we can return the result. Otherwise, we can try again with a larger prefix (e.g. 2x larger than before). And we can continue increasing the prefix, until we pass the whole sequence into the parser, or found some prefix, which is parsed successfully. This whole algorithm works in a time proportional to the number of tokens consumed by the parse function, which is fast enough.
There are some tricky moments in the last solution. For example, if we pass some prefix of p tokens into the parser function, and it consumed p-1 of it, and finished successfully, does it mean it would do the same if we pass the whole sequence into it? As I understand, the answer is no, because a parser could potentially try to look at token p+1, see nothing there, and behave differently compared to a situation, where it sees the real next token. But, as I understand, if it finishes and consumes p-5 tokens, we can be sure that the result is the same as for the whole sequence, as the parser never looks at more than 3 tokens in the future.
I don't know the code very well, but constructing the TokenBuffer once sounds the most reasonable, but I think I can see why it's hard to implement.
But we should probably look into what rustc does, since it's clearly faster in this case.
Is there any way to disable include expansion? There is a setting that disables macro proc expansion but I would like to disable include! to avoid rust server crash. I am using vscode.
@antimora I don't think so, but what are you including? Is it Rust code or some data?