Slow first start/initialization
ble version: 0.4.0-devel4+f16c0d8 Bash version: 5.2.15(1)-release (aarch64-apple-darwin22.1.0)
My login shell is zsh but I've been playing with BLE by starting bash from zsh. On first starting bash with BLE it takes quite a long time to load the first time, like over a second. Then if I exit bash and start again it is quicker (due to caching?) but still noticeably slower than I would expect. When I start bash without my user config (without BLE) it is instantaneous.
I looked around and didn't find any documentation or discussion about initialization performance so I'm not sure if this is expected behavior. I'm also not exactly sure how to benchmark the startup time. Any ideas for benchmarking and improving the slow startup time? Might look at https://github.com/romkatv/zsh-bench for inspiration.
The startup time depends on the device and also on the operating system. It is typically about 100-400 msec in a Linux environment. Based on the reports I have received so far, the performance in macOS seems to be slower than Linux, but I haven't actually tested it in macOS as I don't have it.
so I'm not sure if this is expected behavior.
This is expected. As you suspected, the startup for the first time is slower due to caching, specifically for the caching of the terminal sequences and keymaps. After the first-time initialization, it shouldn't take so much time, but it is still slower than plain Bash. If you request zero cost from ble.sh, it's impossible.
I'm also not exactly sure how to benchmark the startup time.
There are two phases in the initialization. The first one is the loading phase (source ble.sh), and the other is the attaching phase (ble-attach).
If you would like to see the initialization time measurements for the first phase, you can change the following line of the source file ble.pp and rebuild ble.sh by running make:
diff --git a/ble.pp b/ble.pp
index 00e85a32..b05b8d6d 100644
--- a/ble.pp
+++ b/ble.pp
@@ -1,7 +1,7 @@
#!/bin/bash
#%$> out/ble.sh
#%[release = 0]
-#%[measure_load_time = 0]
+#%[measure_load_time = 1]
#%[debug_keylogger = 1]
#%[leakvar = ""]
#%#----------------------------------------------------------------------------
If you would like to profile the second phase, you can modify ble-attach in bashrc in the following way:
# at the end of bashrc
if [[ ${BLE_VERSION-} ]]; then
ble/debug/profiler/start a
ble-attach
ble/debug/profiler/stop
fi
Then, two files a.func.txt and a.line.txt will be created after starting a ble.sh session. For example, to see the most time-consuming functions, you can sort a.func.txt by column 6:
$ sort -rk 6 a.func.txt | head -20
It should be noted that the time measurements and profiler make the startup even slower because the processing of time measurements and profiling also takes time.
To measure the time of the first-time startup, you can clear the cache before loading ble.sh using the following command
$ bash /path/to/ble.sh --clear-cache
Perfect, thank you. Here are my results.
Loading phase
Without cache.
❯ bash
ble.sh: 1688351168.895969 load start
ble.sh: 1688351168.932548 parsed
[Elapsed 0.007s; CPU U:0.003s S:0.004s (96.76%)] ble.pp/prologue (2 forks)
[Elapsed 0.002s; CPU U:0.001s S:0.001s (97.74%)] src/def.sh (0 forks)
ble/term.sh: updating tput cache for TERM=xterm-256color... done
[Elapsed 0.173s; CPU U:0.045s S:0.107s (87.92%)] src/util.sh (83 forks)
[Elapsed 0.007s; CPU U:0.005s S:0.002s (98.92%)] src/decode.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (97.50%)] src/color.sh (0 forks)
[Elapsed 0.012s; CPU U:0.010s S:0.002s (99.21%)] src/canvas.sh (0 forks)
[Elapsed 0.004s; CPU U:0.002s S:0.002s (98.45%)] src/history.sh (0 forks)
[Elapsed 0.012s; CPU U:0.009s S:0.003s (98.67%)] src/edit.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (95.09%)] lib/core-cmdspec-def.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (96.47%)] lib/core-syntax-def.sh (0 forks)
[Elapsed 0.006s; CPU U:0.003s S:0.002s (97.63%)] lib/core-complete-def.sh (0 forks)
[Elapsed 0.001s; CPU U:0.000s S:0.001s (95.98%)] lib/core-debug-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.001s S:0.001s (96.05%)] contrib/integration/bash-preexec-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (95.25%)] ble.pp/epilogue (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (96.79%)] blerc: '/Users/me/.blerc' (0 forks)
[Elapsed 0.256s; CPU U:0.097s S:0.136s (91.17%)] Total
ble.sh: 1688351169.191618 load end
With cache.
❯ bash
ble.sh: 1688351445.087609 load start
ble.sh: 1688351445.121589 parsed
[Elapsed 0.007s; CPU U:0.003s S:0.004s (97.49%)] ble.pp/prologue (2 forks)
[Elapsed 0.001s; CPU U:0.001s S:0.001s (98.80%)] src/def.sh (0 forks)
[Elapsed 0.020s; CPU U:0.011s S:0.008s (93.65%)] src/util.sh (2 forks)
[Elapsed 0.007s; CPU U:0.005s S:0.002s (98.89%)] src/decode.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (97.28%)] src/color.sh (0 forks)
[Elapsed 0.012s; CPU U:0.010s S:0.002s (99.21%)] src/canvas.sh (0 forks)
[Elapsed 0.004s; CPU U:0.002s S:0.001s (98.20%)] src/history.sh (0 forks)
[Elapsed 0.012s; CPU U:0.008s S:0.003s (99.23%)] src/edit.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (96.25%)] lib/core-cmdspec-def.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (97.15%)] lib/core-syntax-def.sh (0 forks)
[Elapsed 0.005s; CPU U:0.003s S:0.002s (98.23%)] lib/core-complete-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (95.41%)] lib/core-debug-def.sh (0 forks)
[Elapsed 0.003s; CPU U:0.001s S:0.001s (96.66%)] contrib/integration/bash-preexec-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (96.13%)] ble.pp/epilogue (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (96.86%)] blerc: '/Users/me/.blerc' (0 forks)
[Elapsed 0.103s; CPU U:0.063s S:0.037s (97.27%)] Total
ble.sh: 1688351445.227942 load end
In this phase it takes ~0.15 sec longer to load while building the cache.
Attaching phase
Without cache.
❯ sort -rk 6 a.func.txt | head -20
# count subcall allcall total_msec TOTAL% self_msec SELF% child_msec max_msec max_self max_child FUNCNAME (SOURCE)
16781 32282 50683 575.116 36.87% 136.847 8.774% 438.269 13.479 0.034 13.445 ble-bind (/Users/me/.local/share/blesh/ble.sh)
17092 7281 11047 226.731 14.54% 129.166 8.282% 97.565 0.238 0.028 0.210 ble-decode-kbd (/Users/me/.local/share/blesh/ble.sh)
9295 1350 1350 85.353 5.472% 74.375 4.769% 10.978 0.099 0.009 0.090 ble-decode-char/bind (/Users/me/.local/share/blesh/ble.sh)
8092 8100 13648 199.840 12.81% 73.895 4.738% 125.945 0.633 0.013 0.620 ble/unicode/GraphemeCluster/match (/Users/me/.local/share/blesh/ble.sh)
7196 4450 5410 117.787 7.552% 68.370 4.384% 49.417 0.259 0.011 0.248 ble/unicode/GraphemeCluster/s2break-right (/Users/me/.local/share/blesh/ble.sh)
7836 0 0 66.983 4.295% 66.983 4.295% 0.000 1.978 1.978 0.000 ble/string#split-words (/Users/me/.local/share/blesh/ble.sh)
143 108 108 82.233 5.272% 62.260 3.992% 19.973 8.705 8.705 0.000 ble/builtin/bind/read-user-settings/.collect (/Users/me/.local/share/blesh/ble.sh)
5411 2152 2152 74.495 4.776% 48.863 3.133% 25.632 2.037 0.010 2.027 ble-decode-key/bind (/Users/me/.local/share/blesh/ble.sh)
4276 9606 27998 310.190 19.89% 44.956 2.882% 265.234 1.652 0.025 1.627 ble/canvas/trace/.impl (/Users/me/.local/share/blesh/ble.sh)
4107 2738 2738 62.063 3.979% 38.724 2.483% 23.339 0.069 0.015 0.054 ble/util/s2c (/Users/me/.local/share/blesh/ble.sh)
1582 0 0 38.286 2.455% 38.286 2.455% 0.000 0.603 0.603 0.000 ble/util/print (/Users/me/.local/share/blesh/ble.sh)
32 2 2 34.785 2.230% 34.757 2.228% 0.028 8.463 8.463 0.000 starship_precmd (/dev/fd/63)
3584 5231 19942 202.441 12.98% 31.937 2.048% 170.504 33.156 0.018 33.138 ble/decode/cmap/.generate-binder-template (/Users/me/.local/share/blesh/ble.sh)
12 4 4 31.361 2.011% 31.361 2.011% 0.000 8.978 8.978 0.000 ble/util/declare-print-definitions (/Users/me/.local/share/blesh/ble.sh)
663 371 764 231.256 14.83% 28.536 1.830% 202.720 49.500 0.012 49.488 ble/util/assign (/Users/me/.local/share/blesh/ble.sh)
3220 30 34 27.745 1.779% 27.401 1.757% 0.344 0.186 0.011 0.175 ble/util/is-unicode-output (/Users/me/.local/share/blesh/ble.sh)
3347 3351 5923 105.864 6.788% 25.188 1.615% 80.676 4.475 0.014 4.461 ble/init:bind/generate-binder (/Users/me/.local/share/blesh/lib/init-bind.sh)
4 0 0 24.514 1.572% 24.514 1.572% 0.000 12.313 12.313 0.000 ble/util/reset-keymap-of-editing-mode (/Users/me/.local/share/blesh/ble.sh)
2924 0 0 22.404 1.436% 22.404 1.436% 0.000 0.098 0.098 0.000 ble/canvas/trace/.implicit-move (/Users/me/.local/share/blesh/ble.sh)
❯ awk '{sum+=$6;}END{print sum;}' a.func.txt
1559.69
With cache.
❯ sort -rk 6 ble-log-cache/a.func.txt | head -20
# count subcall allcall total_msec TOTAL% self_msec SELF% child_msec max_msec max_self max_child FUNCNAME (SOURCE)
4046 4050 6824 102.070 32.30% 37.749 11.95% 64.321 0.599 0.013 0.586 ble/unicode/GraphemeCluster/match (/Users/me/.local/share/blesh/ble.sh)
3598 2225 2705 60.113 19.02% 35.120 11.11% 24.993 0.256 0.010 0.246 ble/unicode/GraphemeCluster/s2break-right (/Users/me/.local/share/blesh/ble.sh)
2138 4803 13999 159.874 50.59% 23.491 7.433% 136.383 1.652 0.026 1.626 ble/canvas/trace/.impl (/Users/me/.local/share/blesh/ble.sh)
16 1 1 17.550 5.553% 17.539 5.550% 0.011 8.338 8.338 0.000 starship_precmd (/dev/fd/63)
1462 0 0 11.762 3.722% 11.762 3.722% 0.000 0.067 0.067 0.000 ble/canvas/trace/.implicit-move (/Users/me/.local/share/blesh/ble.sh)
320 180 369 49.102 15.54% 11.052 3.497% 38.050 19.685 0.011 19.674 ble/util/assign (/Users/me/.local/share/blesh/ble.sh)
2 0 0 10.784 3.412% 10.784 3.412% 0.000 10.777 10.777 0.000 ble/util/reset-keymap-of-editing-mode (/Users/me/.local/share/blesh/ble.sh)
639 150 899 21.270 6.731% 10.491 3.320% 10.779 4.682 0.023 4.659 source (/Users/me/.cache/blesh/0.4/decode.bind.50215.UTF-8.bind)
3 0 0 10.402 3.292% 10.402 3.292% 0.000 10.402 10.402 0.000 ble/bin/awk (/Users/me/.local/share/blesh/ble.sh)
1025 0 0 8.938 2.828% 8.938 2.828% 0.000 0.027 0.027 0.000 ble/unicode/GraphemeCluster/c2break (/Users/me/.local/share/blesh/ble.sh)
726 484 484 11.974 3.789% 7.573 2.396% 4.401 0.079 0.053 0.026 ble/util/s2c (/Users/me/.local/share/blesh/ble.sh)
3 0 0 6.769 2.142% 6.769 2.142% 0.000 2.595 2.595 0.000 ble/bin/stty (/Users/me/.local/share/blesh/ble.sh)
725 15 17 6.683 2.115% 6.503 2.058% 0.180 0.190 0.010 0.180 ble/util/is-unicode-output (/Users/me/.local/share/blesh/ble.sh)
244 0 0 5.723 1.811% 5.723 1.811% 0.000 0.077 0.077 0.000 ble/function#suppress-stderr:ble/util/isprint+ (/Users/me/.local/share/blesh/ble.sh)
503 408 1011 20.052 6.345% 5.084 1.609% 14.968 1.769 0.029 1.740 ble/builtin/trap/.handler (/Users/me/.local/share/blesh/ble.sh)
20 0 0 4.755 1.505% 4.755 1.505% 0.000 1.547 1.547 0.000 ble/base/xtrace/adjust (/Users/me/.local/share/blesh/ble.sh)
55 48 48 7.333 2.320% 4.748 1.502% 2.585 1.627 1.627 0.000 ble/builtin/bind/read-user-settings/.collect (/Users/me/.local/share/blesh/ble.sh)
476 1546 1546 17.383 5.501% 4.554 1.441% 12.829 0.196 0.012 0.184 ble/canvas/trace/.put-atomic.draw (/Users/me/.local/share/blesh/ble.sh)
1 0 0 4.155 1.315% 4.155 1.315% 0.000 4.155 4.155 0.000 ble/bin/wc (/Users/me/.local/share/blesh/ble.sh)
❯ awk '{sum+=$6;}END{print sum;}' a.func.txt
316.021
In this phase many parts take much longer without the cache. ~~I can give sum totals once I remember how to use awk.~~
(You may notice starship_precmd in here since I am using starship. I have tried without it and it did not make a noticeable difference.)
Not sure if there is anything actionable here for me. Maybe something will stand out to you.
What is your request: to speed up the startup when there is an existing cache, or to speed up the cache creation? If it is the latter, I will not try to improve the performance of the cache creation because the cached part is what is hard to improve the performance. If the performance could have been improved, that part wouldn't have been cached but would have been processed directly.
I have added the sums for the binding part. In total my startup times are roughly ~1800ms without cache and ~420ms with cache. This is on an M1 macbook pro which I would think should be competitive with the best results you have seen.
The startup time depends on the device and also on the operating system. It is typically about 100-400 msec in a Linux environment. Based on the reports I have received so far, the performance in macOS seems to be slower than Linux, but I haven't actually tested it in macOS as I don't have it.
Is the 100-400 ms time you cite here for cached start? Either way it seems like these results are not really in line with expectations. I guess I am wondering if you notice any functions taking particularly long here that might be running especially poorly on mac for whatever reason.
Sorry for the delayed reply.
I have added the sums for the binding part. In total my startup times are roughly ~1800ms without cache and ~420ms with cache. This is on an M1 macbook pro which I would think should be competitive with the best results you have seen.
I think this is the normal range of the delay, although I think 420ms is a bit slower than my experience in Linux but the order is the same.
Is the 100-400 ms time you cite here for cached start?
Yes.
I guess I am wondering if you notice any functions taking particularly long here that might be running especially poorly on mac for whatever reason.
I don't see any significant difference in a particular function. However, the prompt construction and layout seem to take a longer time as a whole with your setup. One possibility is that your starship prompt produces relatively long prompt strings.