rust icon indicating copy to clipboard operation
rust copied to clipboard

Improve lexer performance by 5-10% overall, improve string lexer performance 15%

Open fereidani opened this issue 1 week ago • 10 comments

Hi, this PR improves lexer performance by ~5-10% when lexing the entire standard library. It specifically targets the string lexer, comment lexer, and frontmatter lexer.

  • For strings and comments, it replaces the previous logic with a new eat_past2 function that leverages memchr2.
  • For frontmatter, I eliminated the heap allocation from format! and rewrote the lexer using memchr-based scanning, which is roughly 4× faster.

I also applied a few minor optimizations in other areas.

I’ll send the benchmark repo in the next message. Here are the results on my x86_64 laptop (AMD 6650U):

Benchmarking tokenize_real_world/stdlib_all_files: Collecting 100 samples in esttokenize_real_world/stdlib_all_files
                        time:   [74.193 ms 74.224 ms 74.256 ms]
                        thrpt:  [423.74 MiB/s 423.92 MiB/s 424.10 MiB/s]
                 change:
                        time:   [−5.4046% −5.3465% −5.2907%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5862% +5.6484% +5.7134%]
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  2 (2.00%) high mild
  19 (19.00%) high severe

Benchmarking strip_shebang/valid_shebang: Collecting 100 samples in estimated 5.strip_shebang/valid_shebang
                        time:   [11.391 ns 11.401 ns 11.412 ns]
                        thrpt:  [1.7954 GiB/s 1.7971 GiB/s 1.7987 GiB/s]
                 change:
                        time:   [−8.1076% −7.8921% −7.6485%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2820% +8.5683% +8.8229%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking strip_shebang/no_shebang: Collecting 100 samples in estimated 5.000strip_shebang/no_shebang
                        time:   [4.8656 ns 4.8680 ns 4.8711 ns]
                        thrpt:  [4.2062 GiB/s 4.2089 GiB/s 4.2110 GiB/s]
                 change:
                        time:   [−0.1156% −0.0139% +0.0821%] (p = 0.78 > 0.05)
                        thrpt:  [−0.0821% +0.0139% +0.1157%]
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) high mild
  19 (19.00%) high severe

Benchmarking tokenize/simple_function: Collecting 100 samples in estimated 5.001tokenize/simple_function
                        time:   [288.86 ns 293.20 ns 297.41 ns]
                        thrpt:  [173.16 MiB/s 175.64 MiB/s 178.28 MiB/s]
                 change:
                        time:   [−2.2198% −0.8716% +0.3321%] (p = 0.20 > 0.05)
                        thrpt:  [−0.3310% +0.8793% +2.2702%]
                        No change in performance detected.
Benchmarking tokenize/strings: Collecting 100 samples in estimated 5.0032 s (4.6tokenize/strings        time:   [1.1175 µs 1.1379 µs 1.1573 µs]
                        thrpt:  [44.497 MiB/s 45.258 MiB/s 46.083 MiB/s]
                 change:
                        time:   [−14.860% −13.620% −12.359%] (p = 0.00 < 0.05)
                        thrpt:  [+14.101% +15.767% +17.454%]
                        Performance has improved.
Benchmarking tokenize/single_line_comments: Collecting 100 samples in estimated tokenize/single_line_comments
                        time:   [159.67 ns 161.52 ns 163.29 ns]
                        thrpt:  [315.39 MiB/s 318.84 MiB/s 322.53 MiB/s]
                 change:
                        time:   [+0.4110% +1.4523% +2.4709%] (p = 0.01 < 0.05)
                        thrpt:  [−2.4113% −1.4315% −0.4093%]
                        Change within noise threshold.
Benchmarking tokenize/multi_line_comments: Collecting 100 samples in estimated 5tokenize/multi_line_comments
                        time:   [220.54 ns 223.33 ns 225.99 ns]
                        thrpt:  [227.88 MiB/s 230.60 MiB/s 233.51 MiB/s]
                 change:
                        time:   [−7.7271% −6.7443% −5.7976%] (p = 0.00 < 0.05)
                        thrpt:  [+6.1544% +7.2320% +8.3742%]
                        Performance has improved.
Benchmarking tokenize/literals: Collecting 100 samples in estimated 5.0008 s (13tokenize/literals       time:   [399.63 ns 405.42 ns 410.94 ns]
                        thrpt:  [125.32 MiB/s 127.02 MiB/s 128.86 MiB/s]
                 change:
                        time:   [−1.4649% −0.3653% +0.7608%] (p = 0.54 > 0.05)
                        thrpt:  [−0.7550% +0.3666% +1.4867%]
                        No change in performance detected.

Benchmarking frontmatter/frontmatter_allowed: Collecting 100 samples in estimatefrontmatter/frontmatter_allowed
                        time:   [188.37 ns 189.51 ns 190.85 ns]
                        thrpt:  [264.85 MiB/s 266.71 MiB/s 268.33 MiB/s]
                 change:
                        time:   [−26.032% −25.300% −24.590%] (p = 0.00 < 0.05)
                        thrpt:  [+32.609% +33.869% +35.194%]
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  17 (17.00%) high severe

Benchmarking cursor_first/first: Collecting 100 samples in estimated 5.0000 s (5cursor_first/first      time:   [886.05 ps 886.23 ps 886.43 ps]
                        thrpt:  [42.026 GiB/s 42.035 GiB/s 42.044 GiB/s]
                 change:
                        time:   [−1.7088% −1.6398% −1.5732%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5984% +1.6671% +1.7385%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Benchmarking cursor_iteration/bump_all: Collecting 100 samples in estimated 5.00cursor_iteration/bump_all
                        time:   [891.48 ns 892.06 ns 892.78 ns]
                        thrpt:  [4.1727 GiB/s 4.1760 GiB/s 4.1788 GiB/s]
                 change:
                        time:   [−50.335% −50.211% −50.037%] (p = 0.00 < 0.05)
                        thrpt:  [+100.15% +100.85% +101.35%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

Benchmarking cursor_eat_while/eat_while_alpha: Collecting 100 samples in estimatcursor_eat_while/eat_while_alpha
                        time:   [34.992 ns 34.999 ns 35.007 ns]
                        thrpt:  [1.7292 GiB/s 1.7297 GiB/s 1.7300 GiB/s]
                 change:
                        time:   [−1.0098% −0.8721% −0.7699%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7759% +0.8798% +1.0201%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Benchmarking cursor_eat_until/eat_until_newline: Collecting 100 samples in estimcursor_eat_until/eat_until_newline
                        time:   [3.1314 ns 3.1323 ns 3.1332 ns]
                        thrpt:  [15.754 GiB/s 15.759 GiB/s 15.763 GiB/s]
                 change:
                        time:   [−0.4774% −0.3069% −0.1459%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1461% +0.3078% +0.4797%]
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  14 (14.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

fereidani avatar Dec 05 '25 19:12 fereidani

r? @nnethercote

rustbot has assigned @nnethercote. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot avatar Dec 05 '25 19:12 rustbot

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING:end] tool::ToolBuild { build_compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu, tool: "tidy", path: "src/tools/tidy", mode: ToolBootstrap, source_type: InTree, extra_features: [], allow_features: "", cargo_args: [], artifact_kind: Binary } -- 11.933
[TIMING:end] tool::Tidy { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu } -- 0.000
fmt check
Diff in /checkout/compiler/rustc_lexer/src/cursor.rs:127:
     pub(crate) fn bump_if2(&mut self, expected1: char, expected2: char) -> bool {
         let mut chars = self.chars.clone();
         if let Some(c) = chars.next()
-            && (c == expected1 || c == expected2) {
-                self.chars = chars;
-                return true;
-            }
+            && (c == expected1 || c == expected2)
+        {
+            self.chars = chars;
+            return true;
+        }
         false

rust-log-analyzer avatar Dec 05 '25 19:12 rust-log-analyzer

this is the benchmark library to track performance changes: https://github.com/fereidani/rustc_lexer_benchmark

fereidani avatar Dec 05 '25 19:12 fereidani

@bors try @rust-timer queue

matthiaskrgr avatar Dec 05 '25 19:12 matthiaskrgr

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Dec 05 '25 19:12 rust-timer

:hourglass: Trying commit ef2fa3b4ca1fb712cd4287928879787b5ff850d5 with merge e0cf684abe69de9dd471c12c65d8cf3e198875e5…

To cancel the try build, run the command @bors try cancel.

Workflow: https://github.com/rust-lang/rust/actions/runs/19974165280

rust-bors[bot] avatar Dec 05 '25 19:12 rust-bors[bot]

:sunny: Try build successful (CI) Build commit: e0cf684abe69de9dd471c12c65d8cf3e198875e5 (e0cf684abe69de9dd471c12c65d8cf3e198875e5, parent: 66428d92bec337ed4785d695d0127276a482278c)

rust-bors[bot] avatar Dec 05 '25 22:12 rust-bors[bot]

Queued e0cf684abe69de9dd471c12c65d8cf3e198875e5 with parent 66428d92bec337ed4785d695d0127276a482278c, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.4 hours until the benchmark run finishes.

rust-timer avatar Dec 05 '25 22:12 rust-timer

Finished benchmarking commit (e0cf684abe69de9dd471c12c65d8cf3e198875e5): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.7% [0.0%, 1.7%] 18
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.1% [-0.2%, -0.1%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.0% [1.5%, 6.9%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.3% [-2.3%, -0.8%] 5
All ❌✅ (primary) - - 0

Cycles

Results (primary 3.1%, secondary 1.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.1% [2.3%, 4.9%] 4
Regressions ❌
(secondary)
3.6% [2.0%, 6.4%] 12
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.7% [-6.2%, -1.8%] 6
All ❌✅ (primary) 3.1% [2.3%, 4.9%] 4

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 470.249s -> 469.703s (-0.12%) Artifact size: 386.85 MiB -> 388.89 MiB (0.53%)

rust-timer avatar Dec 05 '25 23:12 rust-timer

Thank you for reviewing this PR. I’m exploring the rustc codebase in my spare time, and the lexer was the first part I dove into. I’m just trying to contribute what I can to help improve the compiler’s performance.

I’m happy to drop the #[inline] hints and apply the other suggestions you mentioned. Let me know what you’d like to do with the PR from here.

fereidani avatar Dec 07 '25 12:12 fereidani

Let's do another perf run just for completeness:

@bors try @rust-timer queue

What to do will depend on the result there. Overall this does add more code without particularly improving readability, IMO, so if there's no perf improvement the impetus to merge is low.

It is cool that you are looking at performance, though. If you have a Linux machine then I would recommend trying out rustc-perf and using that as a starting point for investigating compiler performance.

nnethercote avatar Dec 09 '25 04:12 nnethercote

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Dec 09 '25 04:12 rust-timer

:hourglass: Trying commit 8ae8758d556e5a1a1cf5f12c2d3483a23567cece with merge 1566a6038fa84a4dce380350a79f1b260724cbf4…

To cancel the try build, run the command @bors try cancel.

Workflow: https://github.com/rust-lang/rust/actions/runs/20051598190

rust-bors[bot] avatar Dec 09 '25 04:12 rust-bors[bot]

:sunny: Try build successful (CI) Build commit: 1566a6038fa84a4dce380350a79f1b260724cbf4 (1566a6038fa84a4dce380350a79f1b260724cbf4, parent: 0b96731cd10757f695e99ba675ac26840ff85a79)

rust-bors[bot] avatar Dec 09 '25 06:12 rust-bors[bot]

Queued 1566a6038fa84a4dce380350a79f1b260724cbf4 with parent 0b96731cd10757f695e99ba675ac26840ff85a79, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.0 hours until the benchmark run finishes.

rust-timer avatar Dec 09 '25 06:12 rust-timer

Finished benchmarking commit (1566a6038fa84a4dce380350a79f1b260724cbf4): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.9% [0.5%, 1.6%] 15
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary 3.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.2% [3.2%, 3.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results (primary -2.7%, secondary -0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
-2.7% [-2.7%, -2.7%] 1
Improvements ✅
(secondary)
-2.7% [-2.7%, -2.7%] 1
All ❌✅ (primary) -2.7% [-2.7%, -2.7%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 471.709s -> 470.591s (-0.24%) Artifact size: 389.01 MiB -> 389.02 MiB (0.00%)

rust-timer avatar Dec 09 '25 07:12 rust-timer

Thanks for taking the time to review this PR! I completely understand that you're the maintainer of this code and that consistency with your preferred style is important. That said, I believe the version I proposed improves readability:

    fn double_quoted_string(&mut self) -> bool {
        debug_assert!(self.prev() == '"');
        while let Some(c) = self.eat_past_either(b'"', b'\\') {
            match c {
                b'"' => {
                    return true;
                }
                b'\\' => _ = self.bump_if_either('\\', '"'),
                _ => unreachable!(),
            }
        }
        false
    }

vs

    fn double_quoted_string(&mut self) -> bool {
        debug_assert!(self.prev() == '"');
        while let Some(c) = self.bump() {
            match c {
                '"' => {
                    return true;
                }
                '\\' if self.first() == '\\' || self.first() == '"' => {
                    // Bump again to skip escaped character.
                    self.bump();
                }
                _ => (),
            }
        }
        // End of file reached.
        false
    }

fereidani avatar Dec 09 '25 12:12 fereidani

For the quoted example, the new code is slightly shorter but it requires the separate function, and it's clunkier in a way because it does double comparison of the char -- first in eat_past_either, and then again in the match. Also, there's still the slight performance regression, possibly because of the extra comparisons, which makes it unsuitable to merge right now.

If you want to morph this PR into a readability-oriented one instead of a performance-oriented one, that's fine, but at the moment it feels a bit like an attempt to do both and it's not quite working on either front.

nnethercote avatar Dec 09 '25 20:12 nnethercote

The job aarch64-gnu-llvm-20-1 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
   Compiling core v0.0.0 (/checkout/library/core)
   Compiling libc v0.2.178
   Compiling object v0.37.3

thread 'rustc' (12735) panicked at compiler/rustc_lexer/src/cursor.rs:140:42:
byte index 5 is not a char boundary; it is inside 'π' (bytes 4..6) of `/ 1/π
    #[unstable(feature = "f128", issue = "116909")]
    pub const FRAC_1_PI: f128 = 0.318309886183790671537767526745028724068919291480912897495335_f128;

    /// 1/sqrt(π)
    #[unstable(feature = "f128", issue = "116909")]
    // Also, #[unstable(`[...]
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/core/src/panicking.rs:80:14
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
             at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/core/src/str/mod.rs:69:5
   4: <rustc_lexer::cursor::Cursor>::bump_bytes
   5: <rustc_lexer::cursor::Cursor>::eat_until
   6: <rustc_lexer::cursor::Cursor>::line_comment
   7: <rustc_lexer::cursor::Cursor>::advance_token
   8: <rustc_parse::lexer::Lexer>::next_token_from_cursor
   9: <rustc_parse::lexer::Lexer>::lex_token_trees
  10: <rustc_parse::lexer::Lexer>::lex_token_trees
  11: rustc_parse::lexer::lex_token_trees
  12: rustc_parse::source_file_to_stream
  13: rustc_parse::new_parser_from_source_file
  14: rustc_parse::new_parser_from_file
  15: rustc_expand::module::parse_external_mod
  16: <alloc::boxed::Box<rustc_ast::ast::Item> as rustc_expand::expand::InvocationCollectorNode>::wrap_flat_map_node_walk_flat_map::<<rustc_expand::expand::InvocationCollector>::flat_map_node<alloc::boxed::Box<rustc_ast::ast::Item>>::{closure#1}>
  17: <rustc_expand::expand::InvocationCollector as rustc_ast::mut_visit::MutVisitor>::flat_map_item
  18: <thin_vec::ThinVec<alloc::boxed::Box<rustc_ast::ast::Item>> as rustc_data_structures::flat_map_in_place::FlatMapInPlace<alloc::boxed::Box<rustc_ast::ast::Item>>>::flat_map_in_place::<rustc_ast::mut_visit::visit_items<rustc_expand::expand::InvocationCollector>::{closure#0}, smallvec::SmallVec<[alloc::boxed::Box<rustc_ast::ast::Item>; 1]>>
  19: <rustc_expand::expand::InvocationCollector as rustc_ast::mut_visit::MutVisitor>::visit_crate
  20: <rustc_expand::expand::MacroExpander>::collect_invocations
  21: <rustc_expand::expand::MacroExpander>::fully_expand_fragment
  22: <rustc_expand::expand::MacroExpander>::expand_crate
  23: <rustc_session::session::Session>::time::<rustc_ast::ast::Crate, rustc_interface::passes::configure_and_expand::{closure#1}>
  24: rustc_interface::passes::resolver_for_lowering_raw
      [... omitted 3 frames ...]
  25: <rustc_middle::ty::context::TyCtxt>::resolver_for_lowering
  26: <std::thread::local::LocalKey<core::cell::Cell<*const ()>>>::with::<rustc_middle::ty::context::tls::enter_context<<rustc_middle::ty::context::GlobalCtxt>::enter<rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>::{closure#1}, core::option::Option<rustc_interface::queries::Linker>>::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>
  27: <rustc_middle::ty::context::TyCtxt>::create_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}>
  28: <rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2} as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once::{shim:vtable#0}
  29: <alloc::boxed::Box<dyn for<'a> core::ops::function::FnOnce<(&'a rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &'a std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt<'a>>, &'a rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena<'a>>, &'a rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena<'a>>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}), Output = core::option::Option<rustc_interface::queries::Linker>>> as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once
  30: rustc_interface::passes::create_and_enter_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>
  31: <scoped_tls::ScopedKey<rustc_span::SessionGlobals>>::set::<rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}, ()>
  32: rustc_span::create_session_globals_then::<(), rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---
warning: the ICE couldn't be written to `/checkout/rustc-ice-2025-12-09T23_33_31-12722.txt`: Read-only file system (os error 30)

note: rustc 1.94.0-nightly (c2f952884 2025-12-09) running on aarch64-unknown-linux-gnu

note: compiler flags: --crate-type lib -C opt-level=3 -C embed-bitcode=no -C codegen-units=1 -C debug-assertions=on -C symbol-mangling-version=v0 -Z annotate-moves -Z randomize-layout -Z unstable-options -Z macro-backtrace -C split-debuginfo=off -C prefer-dynamic -C llvm-args=-import-instr-limit=10 -Z inline-mir -Z inline-mir-preserve-debug -Z mir_strip_debuginfo=locals-in-tiny-functions -C link-args=-Wl,-z,origin -C link-args=-Wl,-rpath,$ORIGIN/../lib -C embed-bitcode=yes -Z unstable-options -C force-frame-pointers=non-leaf -Z crate-attr=doc(html_root_url="https://doc.rust-lang.org/nightly/") -Z binary-dep-depinfo -Z force-unstable-if-unmarked

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [resolver_for_lowering_raw] getting the resolver for lowering
end of query stack
[RUSTC-TIMING] core test:false 0.074
error: could not compile `core` (lib)

Caused by:
  process didn't exit successfully: `/checkout/obj/build/bootstrap/debug/rustc /checkout/obj/build/bootstrap/debug/rustc --crate-name core --edition=2024 library/core/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no -C codegen-units=1 --warn=unexpected_cfgs --check-cfg 'cfg(no_fp_fmt_parse)' --check-cfg 'cfg(feature, values(any()))' --check-cfg 'cfg(target_has_reliable_f16)' --check-cfg 'cfg(target_has_reliable_f16_math)' --check-cfg 'cfg(target_has_reliable_f128)' --check-cfg 'cfg(target_has_reliable_f128_math)' --check-cfg 'cfg(llvm_enzyme)' -C debug-assertions=on --check-cfg 'cfg(docsrs,test)' --check-cfg 'cfg(feature, values("debug_refcell", "llvm_enzyme", "optimize_for_size", "panic_immediate_abort"))' -C metadata=5217e21a04983440 -C extra-filename=-20c07df90e53b4bc --out-dir /checkout/obj/build/aarch64-unknown-linux-gnu/stage1-std/aarch64-unknown-linux-gnu/release/deps --target aarch64-unknown-linux-gnu -L dependency=/checkout/obj/build/aarch64-unknown-linux-gnu/stage1-std/aarch64-unknown-linux-gnu/release/deps -L dependency=/checkout/obj/build/aarch64-unknown-linux-gnu/stage1-std/release/deps -Csymbol-mangling-version=v0 -Zannotate-moves -Zrandomize-layout '--check-cfg=cfg(feature,values(any()))' -Zunstable-options -Zmacro-backtrace -Csplit-debuginfo=off -Cprefer-dynamic -Cllvm-args=-import-instr-limit=10 --cfg=randomized_layouts -Zinline-mir -Zinline-mir-preserve-debug -Zmir_strip_debuginfo=locals-in-tiny-functions -Clink-args=-Wl,-z,origin '-Clink-args=-Wl,-rpath,$ORIGIN/../lib' -Alinker-messages -Cembed-bitcode=yes -Zunstable-options -Cforce-frame-pointers=non-leaf '-Zcrate-attr=doc(html_root_url="https://doc.rust-lang.org/nightly/")' -Z binary-dep-depinfo` (exit status: 101)
warning: build failed, waiting for other jobs to finish...
[RUSTC-TIMING] shlex test:false 0.093
[RUSTC-TIMING] build_script_build test:false 0.143
[RUSTC-TIMING] build_script_build test:false 0.201
Bootstrap failed while executing `--stage 2 test --skip compiler --skip src`

rust-log-analyzer avatar Dec 09 '25 23:12 rust-log-analyzer

Thank you for your response. Since eat_past_either uses memchr::memchr2 and is an inline function, the build file sizes will be slightly larger. However, string parsing will be 200-300% faster for longer strings, as it is SIMD-optimized compared to the old byte-by-byte approach.

I've removed the unreachable! parts and simplified the branches, which should hopefully result in better performance and a smaller binary in case compiler was failing to detect unreachable! is actually unreachable.

As I said before, the decision is entirely yours and I fully respect it. I’ve invested a lot of time into this PR, and naturally I’d love to see it merged, but my only real motivation is to help make Rust faster. If you feel it doesn’t belong here, it’s better not to merge it at all.

I'm currently reading the compiler source code and will follow your guidance by using rustc-perf and also measureme to identify bottlenecks.

P.S. Now I think it is actually much cleaner:

    fn double_quoted_string(&mut self) -> bool {
        debug_assert!(self.prev() == '"');
        while let Some(c) = self.eat_past_either(b'"', b'\\') {
            if c == b'"' {
                return true;
            }
            // Current is '\\', bump again if next is an escaped character.
            self.bump_if_either('\\', '"');
        }
        // End of file reached.
        false
    }

fereidani avatar Dec 09 '25 23:12 fereidani

@matthiaskrgr appologies for the ping, would you please rerun the bors rust-timer queue?

fereidani avatar Dec 15 '25 17:12 fereidani

@bors try @rust-timer queue

matthiaskrgr avatar Dec 15 '25 18:12 matthiaskrgr

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Dec 15 '25 18:12 rust-timer

:hourglass: Trying commit 0359fd46f61b7c6981ea52fba90226659af68305 with merge ebcec0ca2149d940a145c9cf3403cdc2c09da563…

To cancel the try build, run the command @bors try cancel.

Workflow: https://github.com/rust-lang/rust/actions/runs/20242345084

rust-bors[bot] avatar Dec 15 '25 18:12 rust-bors[bot]

you can also ask in this zulip thread btw :)

https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/perf.20run/near/541356531

matthiaskrgr avatar Dec 15 '25 18:12 matthiaskrgr

Thank you! Good to know!

fereidani avatar Dec 15 '25 18:12 fereidani

:sunny: Try build successful (CI) Build commit: ebcec0ca2149d940a145c9cf3403cdc2c09da563 (ebcec0ca2149d940a145c9cf3403cdc2c09da563, parent: ee447067e18f07aa6ee67dcf0ddc7b07eb675672)

rust-bors[bot] avatar Dec 15 '25 20:12 rust-bors[bot]

Queued ebcec0ca2149d940a145c9cf3403cdc2c09da563 with parent ee447067e18f07aa6ee67dcf0ddc7b07eb675672, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.0 hours until the benchmark run finishes.

rust-timer avatar Dec 15 '25 20:12 rust-timer

Finished benchmarking commit (ebcec0ca2149d940a145c9cf3403cdc2c09da563): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.9% [0.5%, 1.6%] 15
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (secondary 0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.9% [1.6%, 2.2%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.3% [-3.3%, -3.3%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 478.788s -> 479.083s (0.06%) Artifact size: 390.22 MiB -> 390.24 MiB (0.01%)

rust-timer avatar Dec 15 '25 20:12 rust-timer

The lexing code has been micro-optimized heavily in the past and it's genuinely difficult to improve it. I don't think using memchr is going to help, because most tokens are very short, and truly long tokens are extremely rare. The benchmarks are the ultimate test, of course, and there are still some regressions. I don't think this PR is likely to be fruitful; perhaps looking into parsing might be worthwhile because that accounts for a larger fraction (though still fairly small) of execution time.

nnethercote avatar Dec 17 '25 02:12 nnethercote