Reduce `std` library startup parsing overhead
Related problem
I keep hearing about the performance concerns of std, but I don't think there are any issues or enhancements tracking the problem. This is to track that issue/improvement, along with possible solutions.
As noted in @fdncred's timings below, the use of the standard library increases startup time. While the current amount is fairly small (<50ms in all cases below, at least), the problem is that the delay is due to parsing, which means adding features (or even fixes) to std causes additional delay. This negates the potential usefulness of having a standard library.
Note that, currently, all std code is actually parsed at startup for some reason. This can be seen in view files:
╭────┬─────────────────────────────────────┬────────┬────────┬───────╮
│ # │ filename │ start │ end │ size │
├────┼─────────────────────────────────────┼────────┼────────┼───────┤
│ 0 │ source │ 0 │ 10 │ 10 │
│ 1 │ Host Environment Variables │ 10 │ 5151 │ 5141 │
│ 2 │ NU_STDLIB_VIRTUAL_DIR/std/mod.nu │ 5151 │ 12622 │ 7471 │
│ 3 │ NU_STDLIB_VIRTUAL_DIR/std/dirs.nu │ 12622 │ 16759 │ 4137 │
│ 4 │ NU_STDLIB_VIRTUAL_DIR/std/dt.nu │ 16759 │ 23631 │ 6872 │
│ 5 │ NU_STDLIB_VIRTUAL_DIR/std/help.nu │ 23631 │ 53971 │ 30340 │
│ 6 │ NU_STDLIB_VIRTUAL_DIR/std/iter.nu │ 53971 │ 59316 │ 5345 │
│ 7 │ NU_STDLIB_VIRTUAL_DIR/std/log.nu │ 59316 │ 67703 │ 8387 │
│ 8 │ NU_STDLIB_VIRTUAL_DIR/std/assert.nu │ 67703 │ 75601 │ 7898 │
│ 9 │ NU_STDLIB_VIRTUAL_DIR/std/xml.nu │ 75601 │ 83667 │ 8066 │
│ 10 │ ... │ ... │ ... │ ... │
╰────┴─────────────────────────────────────┴────────┴────────┴───────╯
Describe the solution you'd like
Is it possible to place the module files in the virtual directory without them being parsed at that time? It seems to me that parsing should be held until the modules are actually used, right?
Yes, some std files are use'd at startup, as can be seen in (for me):
> view files | select 13
╭───┬────────────────┬───────┬───────┬──────╮
│ # │ filename │ start │ end │ size │
├───┼────────────────┼───────┼───────┼──────┤
│ 0 │ loading stdlib │ 86390 │ 86517 │ 127 │
╰───┴────────────────┴───────┴───────┴──────╯
> view files | get 13 | view span $in.start $in.end
# Define the `std` module
module std
# Prelude
use std dirs [
enter
shells
g
n
p
dexit
]
use std pwd
But this shouldn't result in the parsing of everything else, right? I mean, I see why it does by looking at the way virtual_dirs work in the engine, but it would be nice if using a virtual directory to "hold" the std files didn't mean parsing them first.
Describe alternatives you've considered
No response
Additional context and details
No response
Just to track it further. Here's what I posted in discord.
This is my WSL
> nu -n
> $nu.startup-time
32ms 480µs 673ns
> exit
> nu -n --no-std-lib
> $nu.startup-time
2ms 998µs 419ns
(difference 29ms 482µs 254ns)
This is my Windows 11 23H2 VM
> nu -n
> $nu.startup-time
28ms 795µs 900ns
> exit
> nu -n --no-std-lib
> $nu.startup-time
7ms 82µs 100ns
(difference 21ms 713µs 800ns)
And my Windows 10 22H2 Desktop
> nu -n
> $nu.startup-time
61ms 399µs 900ns
> exit
> nu -n --no-std-lib
> $nu.startup-time
13ms 14µs 100ns
(difference 48ms 385µs 800ns)
And my MBP M1
> nu -n
> $nu.startup-time
17ms 603µs 750ns
> exit
> nu -n --no-std-lib
> $nu.startup-time
3ms 463µs 500ns
(difference 14ms 140µs 250ns)
Thanks - As mentioned in Discord, mine were slightly better/faster, but that's going to be system dependent, and we need to plan for "worse case" anyway.
While I'm personally not too concerned until my startup times get near 100-250ms (I think), it's clear here that std has a pretty significant bump on the load time that ultimately needs to get rectified. This issue/enhancement is just one (as mentioned, admittedly XY) thought.
Closed by #13842
I thought it might be good to run the tests i did again for fun but when I started i noticed that nu -n is printing the startup banner. I guess that's right???? because banner is a stdlib thing? was just confusing now because we're comparing apples to oranges.
but where ever i run it, -n vs --no-std-lib shows that -n is about 20ms slower based on the new prs for stdlib.
but when I started i noticed that
nu -nis printing the startup banner. I guess that's right???? because banner is a stdlib thing? was just confusing now because we're comparing apples to oranges.
I might be misunderstanding your point, because nu -n prints the banner in all releases that I tested back to 0.89.0. That's done in the show_banner block in view files.
There's a slight change in the code here, but it shouldn't result in a user-facing change.
- Before:
show_bannerimported standard-lib then displayed the banner. - Now:
std/coreis imported in the prelude so that thepwdcommand is available. Allshow_bannerdoes is call thebannercommand.
In retrospect, I think I should change this back. It will be every-so-slightly more performant when using nu -c or nu --no-std-lib. I think it's around 200us, but I'll see. Not a big deal, but I'll check.
but where ever i run it, -n vs --no-std-lib shows that -n is about 20ms slower based on the new prs for stdlib.
How are you testing? -no-std-lib would, of course, still have the penalty of running the config files, so it's natural that it's going to take longer. How much longer depends on the config files. The best way I know of to test that is to use the defaults in a mktemp -d for $env.XDG_CONFIG_HOME.
Here are my comparison runs in asciinema.
-
Nightly/main
nu: 14-15ms -
0.98.0
nu: 26-27ms -
Nightly/main
nu -n: 4-5ms -
0.98.0
nu -n: 18-19ms -
Nightly/main
nu --no-std-lib: 9-11ms -
0.98.0
nu --no-std-lib: 9-11ms -
--no-std-lib -n(both releases): < 2ms
@fdncred Thinking about this some more, if you haven't updated your startup to use the std/<submodule> format, keep in mind that you have the full-startup penalty (and then some).
It's definitely currently slower to load a full use std * or even use std log (instead of use std/log) with the PR in place. This is primarily because we are loading dirs twice - Once as the "real" version, another as the "deprecated warning" version.
The difference is about 7-8ms on my system.
Two points:
- Most of all of this will go away in the following release (0.100.0) when we remove the
deprecated_dirs. Nushell startup times will also increase again. I'll make the change and run the timings. - We want people to use
use std/<submodule>to avoid the performance penalty anyway.
bench { ~/nu-0.98.0-x86_64-unknown-linux-gnu/nu --no-config-file -c "use std *; exit" }
╭───────┬───────────────────────────╮
│ mean │ 22ms 340µs 316ns │
│ min │ 21ms 635µs 603ns │
│ max │ 24ms 69µs 953ns │
│ std │ 541µs 416ns │
│ │ ╭────┬──────────────────╮ │
│ times │ │ 0 │ 24ms 69µs 953ns │ │
│ │ │ 1 │ 23ms 969µs 712ns │ │
│ │ │ 2 │ 22ms 481µs 392ns │ │
│ │ │ 3 │ 22ms 937µs 10ns │ │
│ │ │ 4 │ 22ms 446µs 315ns │ │
│ │ │ 5 │ 22ms 444µs 712ns │ │
│ │ │ 6 │ 23ms 230µs 629ns │ │
│ │ │ 7 │ 22ms 41µs 866ns │ │
│ │ │ 8 │ 22ms 237µs 288ns │ │
│ │ │ 9 │ 21ms 926µs 168ns │ │
│ │ │ 10 │ 22ms 239µs 532ns │ │
│ │ │ 11 │ 21ms 635µs 603ns │ │
│ │ │ 12 │ 22ms 161µs 696ns │ │
│ │ │ 13 │ 22ms 310µs 657ns │ │
│ │ │ 14 │ 22ms 46µs 465ns │ │
│ │ │ 15 │ 21ms 828µs 522ns │ │
│ │ │ 16 │ 22ms 25µs 646ns │ │
│ │ │ 17 │ 21ms 649µs 770ns │ │
│ │ │ 18 │ 21ms 910µs 849ns │ │
│ │ │ 19 │ 21ms 856µs 674ns │ │
│ │ │ 20 │ 22ms 217µs 721ns │ │
│ │ │ 21 │ 21ms 708µs 824ns │ │
│ │ │ 22 │ 22ms 486µs 372ns │ │
│ │ │ 23 │ 21ms 806µs 268ns │ │
│ │ │ 24 │ 22ms 51µs 907ns │ │
│ │ │ 25 │ 22ms 90µs 458ns │ │
│ │ │ 26 │ 21ms 860µs 731ns │ │
│ │ │ 27 │ 21ms 975µs 411ns │ │
│ │ │ 28 │ 21ms 936µs 497ns │ │
│ │ │ 29 │ 22ms 687µs 483ns │ │
│ │ │ 30 │ 21ms 908µs 624ns │ │
│ │ │ 31 │ 22ms 183µs 976ns │ │
│ │ │ 32 │ 21ms 917µs 309ns │ │
│ │ │ 33 │ 21ms 920µs 327ns │ │
│ │ │ 34 │ 21ms 925µs 285ns │ │
│ │ │ 35 │ 21ms 979µs 658ns │ │
│ │ │ 36 │ 22ms 480µs 100ns │ │
│ │ │ 37 │ 23ms 91µs 924ns │ │
│ │ │ 38 │ 23ms 17µs 60ns │ │
│ │ │ 39 │ 23ms 364µs 312ns │ │
│ │ │ 40 │ 22ms 849µs 574ns │ │
│ │ │ 41 │ 22ms 199µs 946ns │ │
│ │ │ 42 │ 22ms 431µs 499ns │ │
│ │ │ 43 │ 22ms 660µs 753ns │ │
│ │ │ 44 │ 22ms 841µs 206ns │ │
│ │ │ 45 │ 22ms 469µs 140ns │ │
│ │ │ 46 │ 21ms 937µs 107ns │ │
│ │ │ 47 │ 22ms 100µs 708ns │ │
│ │ │ 48 │ 22ms 798µs 105ns │ │
│ │ │ 49 │ 22ms 667µs 78ns │ │
│ │ ╰────┴──────────────────╯ │
╰───────┴───────────────────────────╯
bench { nu --no-config-file -c "use std *; exit" }
╭───────┬───────────────────────────╮
│ mean │ 28ms 869µs 74ns │
│ min │ 26ms 918µs 945ns │
│ max │ 46ms 553µs 483ns │
│ std │ 3ms 342µs 413ns │
│ │ ╭────┬──────────────────╮ │
│ times │ │ 0 │ 29ms 874µs 967ns │ │
│ │ │ 1 │ 28ms 75µs 134ns │ │
│ │ │ 2 │ 27ms 564µs 504ns │ │
│ │ │ 3 │ 27ms 591µs 565ns │ │
│ │ │ 4 │ 27ms 324µs 697ns │ │
│ │ │ 5 │ 27ms 498µs 347ns │ │
│ │ │ 6 │ 27ms 55µs 756ns │ │
│ │ │ 7 │ 27ms 695µs 623ns │ │
│ │ │ 8 │ 27ms 738µs 355ns │ │
│ │ │ 9 │ 28ms 485µs 994ns │ │
│ │ │ 10 │ 27ms 833µs 354ns │ │
│ │ │ 11 │ 27ms 254µs 244ns │ │
│ │ │ 12 │ 26ms 918µs 945ns │ │
│ │ │ 13 │ 27ms 970µs 666ns │ │
│ │ │ 14 │ 27ms 621µs 731ns │ │
│ │ │ 15 │ 27ms 834µs 35ns │ │
│ │ │ 16 │ 27ms 327µs 232ns │ │
│ │ │ 17 │ 28ms 776µs 306ns │ │
│ │ │ 18 │ 28ms 988µs 336ns │ │
│ │ │ 19 │ 28ms 999µs 215ns │ │
│ │ │ 20 │ 30ms 120µs 492ns │ │
│ │ │ 21 │ 41ms 510µs 857ns │ │
│ │ │ 22 │ 46ms 553µs 483ns │ │
│ │ │ 23 │ 33ms 714µs 296ns │ │
│ │ │ 24 │ 30ms 836µs 963ns │ │
│ │ │ 25 │ 28ms 962µs 541ns │ │
│ │ │ 26 │ 27ms 731µs 916ns │ │
│ │ │ 27 │ 28ms 198µs 384ns │ │
│ │ │ 28 │ 27ms 863µs 972ns │ │
│ │ │ 29 │ 27ms 880µs 212ns │ │
│ │ │ 30 │ 27ms 377µs 868ns │ │
│ │ │ 31 │ 27ms 256µs 659ns │ │
│ │ │ 32 │ 27ms 469µs 20ns │ │
│ │ │ 33 │ 28ms 123µs 5ns │ │
│ │ │ 34 │ 27ms 590µs 642ns │ │
│ │ │ 35 │ 27ms 596µs 293ns │ │
│ │ │ 36 │ 27ms 556µs 439ns │ │
│ │ │ 37 │ 27ms 861µs 5ns │ │
│ │ │ 38 │ 28ms 121µs 482ns │ │
│ │ │ 39 │ 27ms 435µs 968ns │ │
│ │ │ 40 │ 27ms 901µs 853ns │ │
│ │ │ 41 │ 27ms 985µs 634ns │ │
│ │ │ 42 │ 28ms 13µs 146ns │ │
│ │ │ 43 │ 28ms 248µs 362ns │ │
│ │ │ 44 │ 29ms 22µs 925ns │ │
│ │ │ 45 │ 27ms 438µs 643ns │ │
│ │ │ 46 │ 27ms 929µs 638ns │ │
│ │ │ 47 │ 29ms 719µs 289ns │ │
│ │ │ 48 │ 30ms 355µs 398ns │ │
│ │ │ 49 │ 28ms 648µs 345ns │ │
│ │ ╰────┴──────────────────╯ │
╰───────┴───────────────────────────╯
After removing deprecated_dirs we're still about 2.5ms slower on a use std *. I'll poke around and see if I can optimize further, because I do want it to be as fast as it can. On the flip side, again, use std * is the suboptimal way of doing it in the first place.
bench { ./target/release/nu --no-config-file -c "use std *; exit" }
╭───────┬───────────────────────────╮
│ mean │ 24ms 902µs 701ns │
│ min │ 23ms 982µs 69ns │
│ max │ 27ms 307µs 566ns │
│ std │ 647µs 94ns │
│ │ ╭────┬──────────────────╮ │
│ times │ │ 0 │ 27ms 307µs 566ns │ │
│ │ │ 1 │ 24ms 700µs 78ns │ │
│ │ │ 2 │ 24ms 425µs 171ns │ │
│ │ │ 3 │ 25ms 398µs 487ns │ │
│ │ │ 4 │ 25ms 778µs 700ns │ │
│ │ │ 5 │ 25ms 692µs 159ns │ │
│ │ │ 6 │ 25ms 528µs 358ns │ │
│ │ │ 7 │ 24ms 924µs 807ns │ │
│ │ │ 8 │ 24ms 356µs 906ns │ │
│ │ │ 9 │ 24ms 823µs 909ns │ │
│ │ │ 10 │ 24ms 643µs 626ns │ │
│ │ │ 11 │ 24ms 570µs 298ns │ │
│ │ │ 12 │ 24ms 457µs 955ns │ │
│ │ │ 13 │ 23ms 982µs 69ns │ │
│ │ │ 14 │ 24ms 866µs 736ns │ │
│ │ │ 15 │ 24ms 810µs 229ns │ │
│ │ │ 16 │ 24ms 917µs 614ns │ │
│ │ │ 17 │ 25ms 304µs 82ns │ │
│ │ │ 18 │ 24ms 948µs 305ns │ │
│ │ │ 19 │ 23ms 994µs 618ns │ │
│ │ │ 20 │ 24ms 676µs 926ns │ │
│ │ │ 21 │ 24ms 199µs 934ns │ │
│ │ │ 22 │ 25ms 676µs 933ns │ │
│ │ │ 23 │ 24ms 816µs 485ns │ │
│ │ │ 24 │ 24ms 915µs 499ns │ │
│ │ │ 25 │ 24ms 643µs 506ns │ │
│ │ │ 26 │ 24ms 715µs 116ns │ │
│ │ │ 27 │ 25ms 295µs 371ns │ │
│ │ │ 28 │ 24ms 299µs 373ns │ │
│ │ │ 29 │ 24ms 498µs 913ns │ │
│ │ │ 30 │ 24ms 477µs 895ns │ │
│ │ │ 31 │ 24ms 181µs 879ns │ │
│ │ │ 32 │ 25ms 947µs 882ns │ │
│ │ │ 33 │ 24ms 303µs 942ns │ │
│ │ │ 34 │ 24ms 39µs │ │
│ │ │ 35 │ 24ms 779µs 395ns │ │
│ │ │ 36 │ 24ms 956µs 788ns │ │
│ │ │ 37 │ 24ms 262µs 568ns │ │
│ │ │ 38 │ 24ms 610µs 759ns │ │
│ │ │ 39 │ 25ms 65µs 113ns │ │
│ │ │ 40 │ 26ms 70µs 745ns │ │
│ │ │ 41 │ 24ms 172µs 195ns │ │
│ │ │ 42 │ 25ms 807µs 978ns │ │
│ │ │ 43 │ 25ms 772µs 603ns │ │
│ │ │ 44 │ 24ms 498µs 307ns │ │
│ │ │ 45 │ 24ms 575µs 48ns │ │
│ │ │ 46 │ 25ms 467µs 892ns │ │
│ │ │ 47 │ 25ms 86µs 365ns │ │
│ │ │ 48 │ 25ms 599µs 131ns │ │
│ │ │ 49 │ 24ms 289µs 872ns │ │
│ │ ╰────┴──────────────────╯ │
╰───────┴───────────────────────────╯
I might be misunderstanding your point, because nu -n prints the banner in all releases...
oops, sorry for the misdirection. i'm easily confused. 😆
In retrospect, I think I should change this back. ...
whatever you think is best. you're doing a great job here!
Thinking about this some more, if you haven't updated your startup to use the std/
format, ...
I have updated
my thoughts were just to test the latest main and see how they compared to the previous posted results.
This is what I'm seeing on Windows 11.
Previous Run
Windows 11
> nu -n
> $nu.startup-time
28ms 795µs 900ns
> nu -n --no-std-lib
> $nu.startup-time
7ms 82µs 100ns
(difference 21ms 713µs 800ns)
MBP
> nu -n
> $nu.startup-time
17ms 603µs 750ns
> nu -n --no-std-lib
> $nu.startup-time
3ms 463µs 500ns
(difference 14ms 140µs 250ns)
Latest Main
Windows 11
> nu -n
> $nu.startup-time
16ms 114µs 500ns
> nu -n --no-std-lib
> $nu.startup-time
10ms 593µs 200ns
(difference 5ms 521µs 300ns)
MBP
> nu -n
> $nu.startup-time
7ms 972µs 333ns
> nu -n --no-std-lib
> $nu.startup-time
3ms 916µs 708ns
(difference 4ms 55µs 625ns)
i'm easily confused. 😆
Hey! Me too!
This is what I'm seeing on Windows 11.
Thanks - That gives me something to zero in on. Let me try on Windows. My Linux/WSL numbers are so low (above) that there really doesn't look like there's a difference.
Ok, after getting all the other kinks with my Windows Nushell ironed out ...
My results seem roughly in line with yours. It also looks like a fairly significant performance gain. What you are saying seems to be that:
- The "before" #13842 launch penalty from
stdwas 21ms 713µs 800ns - After #13842, the launch penalty from
stdis now down to just 5ms 521µs 300ns - You are seeing a net reduction in launch time due to
stdof 16ms 192us 500ns, right?
Here are my numbers:
0.98.0
> nu -n
> $nu.startup-time
24ms 645µs
> exit
> nu -n --no-std-lib
> $nu.startup-time
5ms 787µs 800ns
(difference 18ms 857µs 200ns)
Latest Nightly
> ~\.local\bin\nu-0.98.1-x86_64-pc-windows-msvc-nightly-2024-10-07\nu -n
> $nu.startup-time
11ms 255µs 200ns
❯ ~\.local\bin\nu-0.98.1-x86_64-pc-windows-msvc-nightly-2024-10-07\nu -n --no-std-lib
> $nu.startup-time
6ms 234µs 300ns
(difference 5ms 20µs 900ns)