ble.sh
ble.sh copied to clipboard
Performance with long histories could be better
ble version: 0.4.0-devel3+43be0e4 Bash version: 5.1.16(1)-release (x86_64-pc-linux-gnu)
I have a history with over 64000 entries which I suspect to slow down blesh quite a bit.
I tried some poor man benchmarking with the following commands:
$ export PS4='[${SECONDS}] ${BASH_SOURCE//$HOME/\~}:${LINENO}: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
$ bleopt debug_xtrace=blesh
$ command grep -E "^\[\[\[" blesh2 | awk '{ print $3 }' | sort | uniq -c | sort -nr | less
1351381 ble/builtin/history/option:s():
39180 ble/canvas/trace/.impl():
21282 ble/history/isearch-backward-blockwise():
15735 ble/color/read-sgrspec():
9781 ble/string#split-words():
8671 ble/string#split():
8640 ble/color/read-sgrspec/.arg-next():
7560 ble/unicode/GraphemeCluster/match():
6900 ble/canvas/trace/.implicit-move():
5994 ble/syntax/vanishing-word/register():
5640 ble/canvas/trace/.ps1rc():
....
$ less blesh
.....
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/string#split(): set +f
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for pat in "${pats[@]}"
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2 == ls ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for pat in "${pats[@]}"
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2 == cd [..|-|~] ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local histfile=
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n ignoreboth:erasedups ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local ignorespace ignoredups erasedups spec
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for spec in ${HISTCONTROL//:/ }
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): case "$spec" in
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): ignorespace=1
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): ignoredups=1
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): for spec in ${HISTCONTROL//:/ }
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): case "$spec" in
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): erasedups=1
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2 == [\ \ ]* ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local lastIndex=64329
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( lastIndex>=0 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ less blesh2 == \b\l\e\o\p\t\ \d\e\b\u\g\_\x\t\r\a\c\e\=\b\l\e\s\h\2 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ -n 1 ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): delete_indices=()
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local -a delete_indices
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local shift_histindex_next=0
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local shift_wskip=0
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): local i N=64330
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i=0 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ rm ~/.local/share/atuin/last_sync_time == \l\e\s\s\ \b\l\e\s\h\2\ ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ atuin-empty-history == \l\e\s\s\ \b\l\e\s\h\2\ ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ rm ~/.local/share/atuin/session == \l\e\s\s\ \b\l\e\s\h\2\ ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ vim ~/.local/share/atuin/key == \l\e\s\s\ \b\l\e\s\h\2\ ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): [[ echo "Co-authored-by: $line" == \l\e\s\s\ \b\l\e\s\h\2\ ]]
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i++ ))
[[[2293] ~/.local/share/blesh/ble.sh:83: ble/builtin/history/option:s(): (( i<N-1 ))
.....
I suspect that blesh iterates over every command in my history on every command execution which is quite slow.
Yes, ble.sh needs to scan the entire history when the user has erasedups in HISTCONTROL. Maybe we can consider processing the duplicate entries using awk (when the selected awk implementation supports RS="\0"), but I'm not sure if it improves the performance because the reading of the awk output also takes some time. We need careful benchmarking of several different implementations.
For a temporary workaround, you may drop erasedups from HISTCONTROL.
FYI, this is the loop to be optimized:
https://github.com/akinomyoga/ble.sh/blob/43be0e4d4fccfcb172a891684878eb65b02cdb2e/src/history.sh#L1382-L1390
I have measured the time. When there is one duplicate, the above loop consumes 53.8% of the time of the call of ble/builtin/history/option:s with 100k mock entries. The following two lines consume 44.0% of the time:
https://github.com/akinomyoga/ble.sh/blob/43be0e4d4fccfcb172a891684878eb65b02cdb2e/src/history.sh#L1392-L1393
This line consumes 2.0% of the time:
https://github.com/akinomyoga/ble.sh/blob/43be0e4d4fccfcb172a891684878eb65b02cdb2e/src/history.sh#L1394
I have the following awk version
awk --version
GNU Awk 5.1.1, API: 3.1
I hope that would be new/good enough and it is compiled with the right options. If not I can also change that very easily.
For a temporary workaround, you may drop
erasedupsfromHISTCONTROL.
Thanks for the hint. That would probably work for me, too since I am manually deduping my history anyway. (I think erasedups does not work well with multiple terminals and having dups quite far in the past). It would only be a bit inconvenient if the recent history wasn't deduped.
Thanks for the fast responses! I really appreciate it!
Edit: After removing erasedups my shell feels quite fast again. Not as fast as completely clean and hitting enter on an empty line still has slightly more delay then normal but not as bad as it was before. Thanks again.
I implemented the loop by awk (96e9bf82 944d48e), but it doesn't seem to significantly improve the performance. I have also considered other implementations, but there doesn't seem to be a better solution (unless we write a C program and compile it to process it).
At the moment, the time by the implementation by awk for the history of the size 100k is approximately half of the original naive for loop. The implementationbyawk` can be roughly decomposed into the following three steps:
# schematic code
printf '%s\n' "${old_array[@]}" | # Step 1) export data from Bash
awk '{ "[... process ...]"; }' | # Step 2) process by awk
mapfile -t new_array # Step 3) import the result into Bash
The current bottleneck is step 1. With Bash-5.1, the naive implementation of step 1 printf '%s\0' "${old_array[@]}" already takes more than half of the time of the original for-loop. With Bash-5.2, it is actually slower than the original for-loop. Another approach is declare -p old_array | awk '{ "[ parse declare -p ]"; }' for step 2, which has a similar speed with printf '%s\0' "${old_array[@]}". For this approach, parsing the output of declare -p by awk takes still some time, where gawk is 3x slower than printf '%s\0' and mawk is faster than printf '%s\n by 10-20%.
There are still other switching and approaches of the implementation. The current implementation of erasedups tries to choose an optimal approach based on the Bash version, the available awk implementations in the system, and the current history size. The best combination for a large history is mawk & Bash 5.2 (not yet officially released).
(I think erasedups does not work well with multiple terminals and having dups quite far in the past). It would only be a bit inconvenient if the recent history wasn't deduped.
Since it doesn't seem to improve the performance as far as we process the entire history for erasedups, in commit e4afb5a, I have added an option bleopt history_erasedups_limit to limit the range of the history that undergoes the erasedups checks. With the default setting history_erasedups_limit=0, only the items that have been added in the present session are the target of erasedups checks. I think this works in most cases and also solves the subtlety with the multiple sessions and rewriting of the past history.
Feels a bit snappier already, thanks!
Thank you!
Maybe I have confused you, but with the latest version of ble.sh, the behavior of erasedups is changed from that of Bash (unless you intentionally set bleopt history_erasedups_limit=). Now, only the entries added in the same session are searched for erasedups, so the time consumed by erasedups is currently negligible. It is at least less than a millisecond within my environment for 100k entries. If there is still a lag, I think that would be caused by other parts of ble.sh.
the behavior of
erasedupsis changed from that of Bash (unless you intentionally setbleopt history_erasedups_limit=). Now, only the entries added in the same session are searched forerasedups
Thats not a problem. I needed to manually dedupe my history anyway because if history syncing. I am using the following snippet to do that.
history-dedupe () {
wc -l -- "$HISTFILE";
tac -- "$HISTFILE" | awk '!x[$0]++' | tac | sponge -- "$HISTFILE";
wc -l -- "$HISTFILE";
history -c;
history -r
}
If there is still a lag, I think that would be caused by other parts of ble.sh.
As you probably remember my setup is a bit complex and has a lot of moving parts. Without performance tracking of each bigger function and hook I don't think I can easily pin point the issue.
Without performance tracking of each bigger function and hook I don't think I can easily pin point the issue.
I have supported profiling in commit f629698. The profiler can be activated by
$ ble/debug/profiler/start NAME
where NAME is used for a prefix of output filenames. The profiler can be stopped by
$ ble/debug/profiler/end
The files NAME.func.txt and NAME.line.txt will be generated (or updated if they exist already). The files can be checked by, e.g.,
$ less -RS NAME.func.txt # -S for long line, -R for color codes
$ sort -nrk6 NAME.func.txt | less -RS # sort by column 6
Also, there is a new option bleopt debug_xtrace_ps4. This value is used for PS4 for xtrace of the internal processing of ble.sh. You have directly specified PS4 in the original post, but it doesn't work anymore with the latest version of ble.sh. Instead bleopt debug_xtrace_ps4=... can be used to specified PS4 for internal ble.sh.
The options bleopt debug_xtrace and bleopt debug_xtrace_ps4 are used by ble/debug/profiler, so they should not be changed by the user when ble/debug/profiler is activated.
Thank you very much!!
I now have some ideas where to look for improvements
304 0 0 10826.957 16.36% 10826.957 16.36% 0.000 109.084 109.084 0.000 wezterm (/home/nixos/.bash_aliases)
2283 304 304 10158.872 15.35% 10146.311 15.33% 12.561 192.012 192.012 0.000 _sbp_set_prompt (/home/nixos/.local/share/sbp/sbp.bash)
9243 0 0 7922.842 11.97% 7922.842 11.97% 0.000 32.517 32.517 0.000 type (/home/nixos/.bash_aliases)
100827 65100 253932 15382.238 23.24% 3654.036 5.520% 11728.202 22.428 0.240 22.188 ble/canvas/trace/.impl (/home/nixos/.local/share/blesh/ble.sh)
25038 0 0 2270.985 3.431% 2270.985 3.431% 0.000 4.002 4.002 0.000 ble/string#split (/home/nixos/.local/share/blesh/ble.sh)
55 10 10 1808.791 2.733% 1808.421 2.732% 0.370 809.434 809.434 0.000 __fzf_history__ (/home/nixos/.nix-profile/share/fzf/key-bindings.bash)
19374 0 0 1774.980 2.681% 1774.980 2.681% 0.000 3.645 3.645 0.000 ble/string#split-words (/home/nixos/.local/share/blesh/ble.sh)
45801 49203 52668 4910.018 7.418% 1697.583 2.565% 3212.435 10.717 0.062 10.655 ble/color/read-sgrspec (/home/nixos/.local/share/blesh/ble.sh)
9330 0 0 1507.550 2.277% 1507.550 2.277% 0.000 3.796 3.796 0.000 ble/arithmetic/sum (/home/nixos/.local/share/blesh/ble.sh)
608 304 304 1174.213 1.774% 1154.985 1.745% 19.228 10.444 10.261 0.183 _atuin_precmd (/home/nixos/.bashrc)
23810 52393 620248 54470.389 82.29% 981.706 1.483% 53488.683 667.423 0.457 666.966 ble/textarea#render (/home/nixos/.local/share/blesh/ble.sh)
22176 0 0 682.019 1.030% 682.019 1.030% 0.000 0.826 0.826 0.000 ble/color/read-sgrspec/.arg-next (/home/nixos/.local/share/blesh/ble.sh)
17489 17449 424790 46707.926 70.56% 681.222 1.029% 46026.704 391.817 0.141 391.676 ble/prompt/update (/home/nixos/.local/share/blesh/ble.sh)
Do you have any idea why the following function is at the top? It seems very basic to me and should match the first if at all time.
wezterm() {
if [[ -e /etc/NIXOS ]]; then
command wezterm "$@"
else
if type -P nixGL &>/dev/null; then
nixGL wezterm "$@"
else
if type -P nixGLIntel &>/dev/null; then
nixGLIntel wezterm "$@"
fi
fi
fi
}
It is used by https://github.com/wez/wezterm/blob/main/assets/shell-integration/wezterm.sh#L396-L403 which should be executed once per command run.
updated if they exist already
Amended or replaced? When using the option multiple times without clearing I got totally different results compared to when deleting the files between tries.
Do you have any idea why the following function is at the top? It seems very basic to me and should match the first if at all time.
wezterm() { if [[ -e /etc/NIXOS ]]; then command wezterm "$@" else if type -P nixGL &>/dev/null; then nixGL wezterm "$@" else if type -P nixGLIntel &>/dev/null; then nixGLIntel wezterm "$@" fi fi fi }
Spawning external commands generally has a larger cost than the builtin commands.
For this function, the time is on average 10826.957/304 = 35.615 msec, and the maximum time for a single call is 109.084 msec. These values seem to be a bit heavier than the normal spawning case, so just wezterm might be slow. Hmm, could you tell what is the spawning cost in your system?
$ while date; do :; done | uniq -c
507 2022年 6月 30日 木曜日 04:36:32 JST
585 2022年 6月 30日 木曜日 04:36:33 JST
280 2022年 6月 30日 木曜日 04:36:34 JST
438 2022年 6月 30日 木曜日 04:36:35 JST
480 2022年 6月 30日 木曜日 04:36:36 JST
477 2022年 6月 30日 木曜日 04:36:37 JST
475 2022年 6月 30日 木曜日 04:36:38 JST
542 2022年 6月 30日 木曜日 04:36:39 JST
545 2022年 6月 30日 木曜日 04:36:40 JST
539 2022年 6月 30日 木曜日 04:36:41 JST
536 2022年 6月 30日 木曜日 04:36:42 JST
# kill it by pressing C-c
The fork cost of my system for date is about 1000/500 = 2 msec which is significantly smaller than the reported wezterm spawning.
It is used by https://github.com/wez/wezterm/blob/main/assets/shell-integration/wezterm.sh#L396-L403 which should be executed once per command run.
Hmm, I guess you should always use the fallback version as far as PWD contains only ASCII characters. Or we may request another wezterm-specific terminal sequence to invoke set-working-directory. Communicating through tty is much faster than communicating by creating a process.
Also, possibly this function __wezterm_osc7 can be called only when the current working directory has been changed (instead of running it every time) like using blehook CHPWD?
updated if they exist already
Amended or replaced? When using the option multiple times without clearing I got totally different results compared to when deleting the files between tries.
Amended. The results are summed up. When the profiling is performed multiple times for the same NAME, the final result of the files contains the sum of the times and counts in multiple profiling sessions.
Hmm, could you tell what is the spawning cost in your system?
Not that different from yours:
▶ while date; do :; done | uniq -c
17 Thu Jun 30 11:15:33 AM CEST 2022
545 Thu Jun 30 11:15:34 AM CEST 2022
507 Thu Jun 30 11:15:35 AM CEST 2022
460 Thu Jun 30 11:15:36 AM CEST 2022
489 Thu Jun 30 11:15:37 AM CEST 2022
519 Thu Jun 30 11:15:38 AM CEST 2022
495 Thu Jun 30 11:15:39 AM CEST 2022
562 Thu Jun 30 11:15:40 AM CEST 2022
589 Thu Jun 30 11:15:41 AM CEST 2022
570 Thu Jun 30 11:15:42 AM CEST 2022
589 Thu Jun 30 11:15:43 AM CEST 2022
574 Thu Jun 30 11:15:44 AM CEST 2022
570 Thu Jun 30 11:15:45 AM CEST 2022
564 Thu Jun 30 11:15:46 AM CEST 2022
559 Thu Jun 30 11:15:47 AM CEST 2022
536 Thu Jun 30 11:15:48 AM CEST 2022
524 Thu Jun 30 11:15:49 AM CEST 2022
486 Thu Jun 30 11:15:50 AM CEST 2022
522 Thu Jun 30 11:15:51 AM CEST 2022
Also, possibly this function
__wezterm_osc7can be called only when the current working directory has been changed (instead of running it every time) like usingblehook CHPWD?
This helped. Thanks
Now the only problematic line left that isn't the fzf_history bind which blocks when you search is
2283 304 304 10158.872 15.35% 10146.311 15.33% 12.561 192.012 192.012 0.000 _sbp_set_prompt (/home/nixos/.local/share/sbp/sbp.bash)
I am not sure how to improve that. SBP spawns a new bash and executes quite some functions in it. https://github.com/brujoand/sbp/blob/master/sbp.bash#L67
I probably try to play around with that in the next day if I could convert that into a source or only source a function once.
Thank you! This means that wezterm itself is slow. Yeah, I think we can improve the performance of SBP by making it work in the same process, though it might require some structural changes.
The upstream SBP doesn't seem to be active enough. Maybe we can make a fork of SBP as a "Source" repository (in the GitHub terminology) and perform the suggested structural change for performance improvements.
I have noticed that SBP is stale, too but I am not sure how much I could contribute to that. As you probably know I also have multiple additions for it open for a longer time.