Improve lexer performance by 5-10% overall, improve string lexer performance 15%
Hi, this PR improves lexer performance by ~5-10% when lexing the entire standard library. It specifically targets the string lexer, comment lexer, and frontmatter lexer.
- For strings and comments, it replaces the previous logic with a new
eat_past2function that leveragesmemchr2. - For frontmatter, I eliminated the heap allocation from
format!and rewrote the lexer usingmemchr-based scanning, which is roughly 4× faster.
I also applied a few minor optimizations in other areas.
I’ll send the benchmark repo in the next message. Here are the results on my x86_64 laptop (AMD 6650U):
Benchmarking tokenize_real_world/stdlib_all_files: Collecting 100 samples in esttokenize_real_world/stdlib_all_files
time: [74.193 ms 74.224 ms 74.256 ms]
thrpt: [423.74 MiB/s 423.92 MiB/s 424.10 MiB/s]
change:
time: [−5.4046% −5.3465% −5.2907%] (p = 0.00 < 0.05)
thrpt: [+5.5862% +5.6484% +5.7134%]
Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
2 (2.00%) high mild
19 (19.00%) high severe
Benchmarking strip_shebang/valid_shebang: Collecting 100 samples in estimated 5.strip_shebang/valid_shebang
time: [11.391 ns 11.401 ns 11.412 ns]
thrpt: [1.7954 GiB/s 1.7971 GiB/s 1.7987 GiB/s]
change:
time: [−8.1076% −7.8921% −7.6485%] (p = 0.00 < 0.05)
thrpt: [+8.2820% +8.5683% +8.8229%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking strip_shebang/no_shebang: Collecting 100 samples in estimated 5.000strip_shebang/no_shebang
time: [4.8656 ns 4.8680 ns 4.8711 ns]
thrpt: [4.2062 GiB/s 4.2089 GiB/s 4.2110 GiB/s]
change:
time: [−0.1156% −0.0139% +0.0821%] (p = 0.78 > 0.05)
thrpt: [−0.0821% +0.0139% +0.1157%]
No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
1 (1.00%) high mild
19 (19.00%) high severe
Benchmarking tokenize/simple_function: Collecting 100 samples in estimated 5.001tokenize/simple_function
time: [288.86 ns 293.20 ns 297.41 ns]
thrpt: [173.16 MiB/s 175.64 MiB/s 178.28 MiB/s]
change:
time: [−2.2198% −0.8716% +0.3321%] (p = 0.20 > 0.05)
thrpt: [−0.3310% +0.8793% +2.2702%]
No change in performance detected.
Benchmarking tokenize/strings: Collecting 100 samples in estimated 5.0032 s (4.6tokenize/strings time: [1.1175 µs 1.1379 µs 1.1573 µs]
thrpt: [44.497 MiB/s 45.258 MiB/s 46.083 MiB/s]
change:
time: [−14.860% −13.620% −12.359%] (p = 0.00 < 0.05)
thrpt: [+14.101% +15.767% +17.454%]
Performance has improved.
Benchmarking tokenize/single_line_comments: Collecting 100 samples in estimated tokenize/single_line_comments
time: [159.67 ns 161.52 ns 163.29 ns]
thrpt: [315.39 MiB/s 318.84 MiB/s 322.53 MiB/s]
change:
time: [+0.4110% +1.4523% +2.4709%] (p = 0.01 < 0.05)
thrpt: [−2.4113% −1.4315% −0.4093%]
Change within noise threshold.
Benchmarking tokenize/multi_line_comments: Collecting 100 samples in estimated 5tokenize/multi_line_comments
time: [220.54 ns 223.33 ns 225.99 ns]
thrpt: [227.88 MiB/s 230.60 MiB/s 233.51 MiB/s]
change:
time: [−7.7271% −6.7443% −5.7976%] (p = 0.00 < 0.05)
thrpt: [+6.1544% +7.2320% +8.3742%]
Performance has improved.
Benchmarking tokenize/literals: Collecting 100 samples in estimated 5.0008 s (13tokenize/literals time: [399.63 ns 405.42 ns 410.94 ns]
thrpt: [125.32 MiB/s 127.02 MiB/s 128.86 MiB/s]
change:
time: [−1.4649% −0.3653% +0.7608%] (p = 0.54 > 0.05)
thrpt: [−0.7550% +0.3666% +1.4867%]
No change in performance detected.
Benchmarking frontmatter/frontmatter_allowed: Collecting 100 samples in estimatefrontmatter/frontmatter_allowed
time: [188.37 ns 189.51 ns 190.85 ns]
thrpt: [264.85 MiB/s 266.71 MiB/s 268.33 MiB/s]
change:
time: [−26.032% −25.300% −24.590%] (p = 0.00 < 0.05)
thrpt: [+32.609% +33.869% +35.194%]
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
17 (17.00%) high severe
Benchmarking cursor_first/first: Collecting 100 samples in estimated 5.0000 s (5cursor_first/first time: [886.05 ps 886.23 ps 886.43 ps]
thrpt: [42.026 GiB/s 42.035 GiB/s 42.044 GiB/s]
change:
time: [−1.7088% −1.6398% −1.5732%] (p = 0.00 < 0.05)
thrpt: [+1.5984% +1.6671% +1.7385%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
Benchmarking cursor_iteration/bump_all: Collecting 100 samples in estimated 5.00cursor_iteration/bump_all
time: [891.48 ns 892.06 ns 892.78 ns]
thrpt: [4.1727 GiB/s 4.1760 GiB/s 4.1788 GiB/s]
change:
time: [−50.335% −50.211% −50.037%] (p = 0.00 < 0.05)
thrpt: [+100.15% +100.85% +101.35%]
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
Benchmarking cursor_eat_while/eat_while_alpha: Collecting 100 samples in estimatcursor_eat_while/eat_while_alpha
time: [34.992 ns 34.999 ns 35.007 ns]
thrpt: [1.7292 GiB/s 1.7297 GiB/s 1.7300 GiB/s]
change:
time: [−1.0098% −0.8721% −0.7699%] (p = 0.00 < 0.05)
thrpt: [+0.7759% +0.8798% +1.0201%]
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
Benchmarking cursor_eat_until/eat_until_newline: Collecting 100 samples in estimcursor_eat_until/eat_until_newline
time: [3.1314 ns 3.1323 ns 3.1332 ns]
thrpt: [15.754 GiB/s 15.759 GiB/s 15.763 GiB/s]
change:
time: [−0.4774% −0.3069% −0.1459%] (p = 0.00 < 0.05)
thrpt: [+0.1461% +0.3078% +0.4797%]
Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
14 (14.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
r? @nnethercote
rustbot has assigned @nnethercote. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r? to explicitly pick a reviewer
The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)
Click to see the possible cause of the failure (guessed by this bot)
[TIMING:end] tool::ToolBuild { build_compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu, tool: "tidy", path: "src/tools/tidy", mode: ToolBootstrap, source_type: InTree, extra_features: [], allow_features: "", cargo_args: [], artifact_kind: Binary } -- 11.933
[TIMING:end] tool::Tidy { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu } -- 0.000
fmt check
Diff in /checkout/compiler/rustc_lexer/src/cursor.rs:127:
pub(crate) fn bump_if2(&mut self, expected1: char, expected2: char) -> bool {
let mut chars = self.chars.clone();
if let Some(c) = chars.next()
- && (c == expected1 || c == expected2) {
- self.chars = chars;
- return true;
- }
+ && (c == expected1 || c == expected2)
+ {
+ self.chars = chars;
+ return true;
+ }
false
this is the benchmark library to track performance changes: https://github.com/fereidani/rustc_lexer_benchmark
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit ef2fa3b4ca1fb712cd4287928879787b5ff850d5 with merge e0cf684abe69de9dd471c12c65d8cf3e198875e5…
To cancel the try build, run the command @bors try cancel.
Workflow: https://github.com/rust-lang/rust/actions/runs/19974165280
:sunny: Try build successful (CI)
Build commit: e0cf684abe69de9dd471c12c65d8cf3e198875e5 (e0cf684abe69de9dd471c12c65d8cf3e198875e5, parent: 66428d92bec337ed4785d695d0127276a482278c)
Queued e0cf684abe69de9dd471c12c65d8cf3e198875e5 with parent 66428d92bec337ed4785d695d0127276a482278c, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.4 hours until the benchmark run finishes.
Finished benchmarking commit (e0cf684abe69de9dd471c12c65d8cf3e198875e5): comparison URL.
Overall result: ❌✅ regressions and improvements - please read the text below
Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.
@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
0.7% | [0.0%, 1.7%] | 18 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
-0.1% | [-0.2%, -0.1%] | 2 |
| All ❌✅ (primary) | - | - | 0 |
Max RSS (memory usage)
Results (secondary 2.1%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
4.0% | [1.5%, 6.9%] | 9 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
-1.3% | [-2.3%, -0.8%] | 5 |
| All ❌✅ (primary) | - | - | 0 |
Cycles
Results (primary 3.1%, secondary 1.2%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
3.1% | [2.3%, 4.9%] | 4 |
| Regressions ❌ (secondary) |
3.6% | [2.0%, 6.4%] | 12 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
-3.7% | [-6.2%, -1.8%] | 6 |
| All ❌✅ (primary) | 3.1% | [2.3%, 4.9%] | 4 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 470.249s -> 469.703s (-0.12%) Artifact size: 386.85 MiB -> 388.89 MiB (0.53%)
Thank you for reviewing this PR. I’m exploring the rustc codebase in my spare time, and the lexer was the first part I dove into. I’m just trying to contribute what I can to help improve the compiler’s performance.
I’m happy to drop the #[inline] hints and apply the other suggestions you mentioned. Let me know what you’d like to do with the PR from here.
Let's do another perf run just for completeness:
@bors try @rust-timer queue
What to do will depend on the result there. Overall this does add more code without particularly improving readability, IMO, so if there's no perf improvement the impetus to merge is low.
It is cool that you are looking at performance, though. If you have a Linux machine then I would recommend trying out rustc-perf and using that as a starting point for investigating compiler performance.
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 8ae8758d556e5a1a1cf5f12c2d3483a23567cece with merge 1566a6038fa84a4dce380350a79f1b260724cbf4…
To cancel the try build, run the command @bors try cancel.
Workflow: https://github.com/rust-lang/rust/actions/runs/20051598190
:sunny: Try build successful (CI)
Build commit: 1566a6038fa84a4dce380350a79f1b260724cbf4 (1566a6038fa84a4dce380350a79f1b260724cbf4, parent: 0b96731cd10757f695e99ba675ac26840ff85a79)
Queued 1566a6038fa84a4dce380350a79f1b260724cbf4 with parent 0b96731cd10757f695e99ba675ac26840ff85a79, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.0 hours until the benchmark run finishes.
Finished benchmarking commit (1566a6038fa84a4dce380350a79f1b260724cbf4): comparison URL.
Overall result: ❌ regressions - please read the text below
Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.
@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
0.9% | [0.5%, 1.6%] | 15 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | - | - | 0 |
Max RSS (memory usage)
Results (secondary 3.2%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
3.2% | [3.2%, 3.2%] | 1 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | - | - | 0 |
Cycles
Results (primary -2.7%, secondary -0.4%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
2.0% | [2.0%, 2.0%] | 1 |
| Improvements ✅ (primary) |
-2.7% | [-2.7%, -2.7%] | 1 |
| Improvements ✅ (secondary) |
-2.7% | [-2.7%, -2.7%] | 1 |
| All ❌✅ (primary) | -2.7% | [-2.7%, -2.7%] | 1 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 471.709s -> 470.591s (-0.24%) Artifact size: 389.01 MiB -> 389.02 MiB (0.00%)
Thanks for taking the time to review this PR! I completely understand that you're the maintainer of this code and that consistency with your preferred style is important. That said, I believe the version I proposed improves readability:
fn double_quoted_string(&mut self) -> bool {
debug_assert!(self.prev() == '"');
while let Some(c) = self.eat_past_either(b'"', b'\\') {
match c {
b'"' => {
return true;
}
b'\\' => _ = self.bump_if_either('\\', '"'),
_ => unreachable!(),
}
}
false
}
vs
fn double_quoted_string(&mut self) -> bool {
debug_assert!(self.prev() == '"');
while let Some(c) = self.bump() {
match c {
'"' => {
return true;
}
'\\' if self.first() == '\\' || self.first() == '"' => {
// Bump again to skip escaped character.
self.bump();
}
_ => (),
}
}
// End of file reached.
false
}
For the quoted example, the new code is slightly shorter but it requires the separate function, and it's clunkier in a way because it does double comparison of the char -- first in eat_past_either, and then again in the match. Also, there's still the slight performance regression, possibly because of the extra comparisons, which makes it unsuitable to merge right now.
If you want to morph this PR into a readability-oriented one instead of a performance-oriented one, that's fine, but at the moment it feels a bit like an attempt to do both and it's not quite working on either front.
The job aarch64-gnu-llvm-20-1 failed! Check out the build log: (web) (plain enhanced) (plain)
Click to see the possible cause of the failure (guessed by this bot)
Compiling core v0.0.0 (/checkout/library/core)
Compiling libc v0.2.178
Compiling object v0.37.3
thread 'rustc' (12735) panicked at compiler/rustc_lexer/src/cursor.rs:140:42:
byte index 5 is not a char boundary; it is inside 'π' (bytes 4..6) of `/ 1/π
#[unstable(feature = "f128", issue = "116909")]
pub const FRAC_1_PI: f128 = 0.318309886183790671537767526745028724068919291480912897495335_f128;
/// 1/sqrt(π)
#[unstable(feature = "f128", issue = "116909")]
// Also, #[unstable(`[...]
stack backtrace:
0: __rustc::rust_begin_unwind
at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/std/src/panicking.rs:698:5
1: core::panicking::panic_fmt
at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/core/src/panicking.rs:80:14
2: core::str::slice_error_fail_rt
3: core::str::slice_error_fail
at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/core/src/str/mod.rs:69:5
4: <rustc_lexer::cursor::Cursor>::bump_bytes
5: <rustc_lexer::cursor::Cursor>::eat_until
6: <rustc_lexer::cursor::Cursor>::line_comment
7: <rustc_lexer::cursor::Cursor>::advance_token
8: <rustc_parse::lexer::Lexer>::next_token_from_cursor
9: <rustc_parse::lexer::Lexer>::lex_token_trees
10: <rustc_parse::lexer::Lexer>::lex_token_trees
11: rustc_parse::lexer::lex_token_trees
12: rustc_parse::source_file_to_stream
13: rustc_parse::new_parser_from_source_file
14: rustc_parse::new_parser_from_file
15: rustc_expand::module::parse_external_mod
16: <alloc::boxed::Box<rustc_ast::ast::Item> as rustc_expand::expand::InvocationCollectorNode>::wrap_flat_map_node_walk_flat_map::<<rustc_expand::expand::InvocationCollector>::flat_map_node<alloc::boxed::Box<rustc_ast::ast::Item>>::{closure#1}>
17: <rustc_expand::expand::InvocationCollector as rustc_ast::mut_visit::MutVisitor>::flat_map_item
18: <thin_vec::ThinVec<alloc::boxed::Box<rustc_ast::ast::Item>> as rustc_data_structures::flat_map_in_place::FlatMapInPlace<alloc::boxed::Box<rustc_ast::ast::Item>>>::flat_map_in_place::<rustc_ast::mut_visit::visit_items<rustc_expand::expand::InvocationCollector>::{closure#0}, smallvec::SmallVec<[alloc::boxed::Box<rustc_ast::ast::Item>; 1]>>
19: <rustc_expand::expand::InvocationCollector as rustc_ast::mut_visit::MutVisitor>::visit_crate
20: <rustc_expand::expand::MacroExpander>::collect_invocations
21: <rustc_expand::expand::MacroExpander>::fully_expand_fragment
22: <rustc_expand::expand::MacroExpander>::expand_crate
23: <rustc_session::session::Session>::time::<rustc_ast::ast::Crate, rustc_interface::passes::configure_and_expand::{closure#1}>
24: rustc_interface::passes::resolver_for_lowering_raw
[... omitted 3 frames ...]
25: <rustc_middle::ty::context::TyCtxt>::resolver_for_lowering
26: <std::thread::local::LocalKey<core::cell::Cell<*const ()>>>::with::<rustc_middle::ty::context::tls::enter_context<<rustc_middle::ty::context::GlobalCtxt>::enter<rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>::{closure#1}, core::option::Option<rustc_interface::queries::Linker>>::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>
27: <rustc_middle::ty::context::TyCtxt>::create_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}>
28: <rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2} as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once::{shim:vtable#0}
29: <alloc::boxed::Box<dyn for<'a> core::ops::function::FnOnce<(&'a rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &'a std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt<'a>>, &'a rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena<'a>>, &'a rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena<'a>>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}), Output = core::option::Option<rustc_interface::queries::Linker>>> as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once
30: rustc_interface::passes::create_and_enter_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>
31: <scoped_tls::ScopedKey<rustc_span::SessionGlobals>>::set::<rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}, ()>
32: rustc_span::create_session_globals_then::<(), rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---
warning: the ICE couldn't be written to `/checkout/rustc-ice-2025-12-09T23_33_31-12722.txt`: Read-only file system (os error 30)
note: rustc 1.94.0-nightly (c2f952884 2025-12-09) running on aarch64-unknown-linux-gnu
note: compiler flags: --crate-type lib -C opt-level=3 -C embed-bitcode=no -C codegen-units=1 -C debug-assertions=on -C symbol-mangling-version=v0 -Z annotate-moves -Z randomize-layout -Z unstable-options -Z macro-backtrace -C split-debuginfo=off -C prefer-dynamic -C llvm-args=-import-instr-limit=10 -Z inline-mir -Z inline-mir-preserve-debug -Z mir_strip_debuginfo=locals-in-tiny-functions -C link-args=-Wl,-z,origin -C link-args=-Wl,-rpath,$ORIGIN/../lib -C embed-bitcode=yes -Z unstable-options -C force-frame-pointers=non-leaf -Z crate-attr=doc(html_root_url="https://doc.rust-lang.org/nightly/") -Z binary-dep-depinfo -Z force-unstable-if-unmarked
note: some of the compiler flags provided by cargo are hidden
query stack during panic:
#0 [resolver_for_lowering_raw] getting the resolver for lowering
end of query stack
[RUSTC-TIMING] core test:false 0.074
error: could not compile `core` (lib)
Caused by:
process didn't exit successfully: `/checkout/obj/build/bootstrap/debug/rustc /checkout/obj/build/bootstrap/debug/rustc --crate-name core --edition=2024 library/core/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no -C codegen-units=1 --warn=unexpected_cfgs --check-cfg 'cfg(no_fp_fmt_parse)' --check-cfg 'cfg(feature, values(any()))' --check-cfg 'cfg(target_has_reliable_f16)' --check-cfg 'cfg(target_has_reliable_f16_math)' --check-cfg 'cfg(target_has_reliable_f128)' --check-cfg 'cfg(target_has_reliable_f128_math)' --check-cfg 'cfg(llvm_enzyme)' -C debug-assertions=on --check-cfg 'cfg(docsrs,test)' --check-cfg 'cfg(feature, values("debug_refcell", "llvm_enzyme", "optimize_for_size", "panic_immediate_abort"))' -C metadata=5217e21a04983440 -C extra-filename=-20c07df90e53b4bc --out-dir /checkout/obj/build/aarch64-unknown-linux-gnu/stage1-std/aarch64-unknown-linux-gnu/release/deps --target aarch64-unknown-linux-gnu -L dependency=/checkout/obj/build/aarch64-unknown-linux-gnu/stage1-std/aarch64-unknown-linux-gnu/release/deps -L dependency=/checkout/obj/build/aarch64-unknown-linux-gnu/stage1-std/release/deps -Csymbol-mangling-version=v0 -Zannotate-moves -Zrandomize-layout '--check-cfg=cfg(feature,values(any()))' -Zunstable-options -Zmacro-backtrace -Csplit-debuginfo=off -Cprefer-dynamic -Cllvm-args=-import-instr-limit=10 --cfg=randomized_layouts -Zinline-mir -Zinline-mir-preserve-debug -Zmir_strip_debuginfo=locals-in-tiny-functions -Clink-args=-Wl,-z,origin '-Clink-args=-Wl,-rpath,$ORIGIN/../lib' -Alinker-messages -Cembed-bitcode=yes -Zunstable-options -Cforce-frame-pointers=non-leaf '-Zcrate-attr=doc(html_root_url="https://doc.rust-lang.org/nightly/")' -Z binary-dep-depinfo` (exit status: 101)
warning: build failed, waiting for other jobs to finish...
[RUSTC-TIMING] shlex test:false 0.093
[RUSTC-TIMING] build_script_build test:false 0.143
[RUSTC-TIMING] build_script_build test:false 0.201
Bootstrap failed while executing `--stage 2 test --skip compiler --skip src`
Thank you for your response.
Since eat_past_either uses memchr::memchr2 and is an inline function, the build file sizes will be slightly larger. However, string parsing will be 200-300% faster for longer strings, as it is SIMD-optimized compared to the old byte-by-byte approach.
I've removed the unreachable! parts and simplified the branches, which should hopefully result in better performance and a smaller binary in case compiler was failing to detect unreachable! is actually unreachable.
As I said before, the decision is entirely yours and I fully respect it. I’ve invested a lot of time into this PR, and naturally I’d love to see it merged, but my only real motivation is to help make Rust faster. If you feel it doesn’t belong here, it’s better not to merge it at all.
I'm currently reading the compiler source code and will follow your guidance by using rustc-perf and also measureme to identify bottlenecks.
P.S. Now I think it is actually much cleaner:
fn double_quoted_string(&mut self) -> bool {
debug_assert!(self.prev() == '"');
while let Some(c) = self.eat_past_either(b'"', b'\\') {
if c == b'"' {
return true;
}
// Current is '\\', bump again if next is an escaped character.
self.bump_if_either('\\', '"');
}
// End of file reached.
false
}
@matthiaskrgr appologies for the ping, would you please rerun the bors rust-timer queue?
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 0359fd46f61b7c6981ea52fba90226659af68305 with merge ebcec0ca2149d940a145c9cf3403cdc2c09da563…
To cancel the try build, run the command @bors try cancel.
Workflow: https://github.com/rust-lang/rust/actions/runs/20242345084
you can also ask in this zulip thread btw :)
https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/perf.20run/near/541356531
Thank you! Good to know!
:sunny: Try build successful (CI)
Build commit: ebcec0ca2149d940a145c9cf3403cdc2c09da563 (ebcec0ca2149d940a145c9cf3403cdc2c09da563, parent: ee447067e18f07aa6ee67dcf0ddc7b07eb675672)
Queued ebcec0ca2149d940a145c9cf3403cdc2c09da563 with parent ee447067e18f07aa6ee67dcf0ddc7b07eb675672, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.0 hours until the benchmark run finishes.
Finished benchmarking commit (ebcec0ca2149d940a145c9cf3403cdc2c09da563): comparison URL.
Overall result: ❌ regressions - please read the text below
Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.
@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression
Instruction count
Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
0.9% | [0.5%, 1.6%] | 15 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | - | - | 0 |
Max RSS (memory usage)
This benchmark run did not return any relevant results for this metric.
Cycles
Results (secondary 0.9%)
A less reliable metric. May be of interest, but not used to determine the overall result above.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
1.9% | [1.6%, 2.2%] | 4 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
-3.3% | [-3.3%, -3.3%] | 1 |
| All ❌✅ (primary) | - | - | 0 |
Binary size
This benchmark run did not return any relevant results for this metric.
Bootstrap: 478.788s -> 479.083s (0.06%) Artifact size: 390.22 MiB -> 390.24 MiB (0.01%)
The lexing code has been micro-optimized heavily in the past and it's genuinely difficult to improve it. I don't think using memchr is going to help, because most tokens are very short, and truly long tokens are extremely rare. The benchmarks are the ultimate test, of course, and there are still some regressions. I don't think this PR is likely to be fruitful; perhaps looking into parsing might be worthwhile because that accounts for a larger fraction (though still fairly small) of execution time.