bash-preexec icon indicating copy to clipboard operation
bash-preexec copied to clipboard

Remove dependency on sed(1) for history processing

Open jparise opened this issue 11 months ago • 8 comments

We post-process history 1's output to extract the current command. This processing needs to strip the leading history number, an optional * character indicating whether the entry was modified (or a space), and then a space separating character.

We were previously using sed(1) for this, but we can implement an equivalent transformation using bash's native parameter expansion syntax.

This also results in ~4x reduction in per-prompt command overhead.

jparise avatar Jan 12 '25 21:01 jparise

With the latest revision, the pattern now yields correct results for a complete set of possible history values:

#!/usr/bin/env bash

declare -a lines=(
  "10000  history 1"
  " 1000  history 1"
  "    1  history 1"
  " 1000* history 1"
  " 1000   history 1"
)

for line in "${lines[@]}"; do
  echo "${line#*[[:digit:]][* ] }"
done

It matches anything up to the last digit as the number prefix.

jparise avatar Jan 13 '25 00:01 jparise

We could get more precise (and more closely reproduce the sed pattern) if we (temporarily) used extglob, but that might be overkill for this set of constrained inputs.

jparise avatar Jan 13 '25 01:01 jparise

Your script produces a wrong result for the last input:

$ bash bash-completion-167.sh
history 1
history 1
history 1
history 1
 history 1

That's actually intentional. There's a test to ensure we don't strip whitespace from the command itself: https://github.com/rcaloras/bash-preexec/blob/11aba65ac6eb90d3e65adb6ddea132798862aacd/test/bash-preexec.bats#L363-L372

I think that's only relevant if ignorespace and ignoreboth is unset in HISTCONTROL.

jparise avatar Jan 13 '25 01:01 jparise

Ah, I see your point.

akinomyoga avatar Jan 13 '25 01:01 akinomyoga

Thanks for the PR @jparise! and @akinomyoga for the review.

@jparise do we know if this reduces the overhead time of running bash-preexec? Would be curious how much this speeds us up.

@steinarvk, @dseomn - Any feedback on this change as folks who've made changes to the current implementation with sed?

rcaloras avatar Jan 13 '25 02:01 rcaloras

@jparise do we know if this reduces the overhead time of running bash-preexec? Would be curious how much this speeds us up.

In this artificial test, the new (non-sed) approach seems to be ~4× faster:

#!/usr/bin/env bash

old() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(
      export LC_ALL=C
      HISTTIMEFORMAT='' builtin history 1 | sed '1 s/^ *[0-9][0-9]*[* ] //'
    )
    ((++loop))
  done
}

new() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
    this_command="${this_command#*[[:digit:]][* ] }"
    ((++loop))
  done
}

echo "Old"
time -p old

echo "New"
time -p new
Old
real 16.98
user 5.47
sys 10.83
New
real 4.36
user 0.71
sys 3.25

jparise avatar Jan 13 '25 13:01 jparise

If one cares about the overhead, one can also consider using a function substitution (also known as "no-fork command substitution") introduced in Bash 5.3 (which is still beta now, though) with a proper test on BASH_VERSINFO.

#!/usr/bin/env bash

old() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(
      export LC_ALL=C
      HISTTIMEFORMAT='' builtin history 1 | sed '1 s/^ *[0-9][0-9]*[* ] //'
    )
    ((++loop))
  done
}

new() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
    this_command="${this_command#*[[:digit:]][* ] }"
    ((++loop))
  done
}

new2() {
  loop=1
  until [[ $loop -eq 10000 ]]; do
    if (( BASH_VERSINFO[0] > 5 || (BASH_VERSINFO[0] == 5 && BASH_VERSINFO[1] >= 3) )); then
      this_command=${ LC_ALL=C HISTTIMEFORMAT='' builtin history 1; }
    else
      this_command=$(LC_ALL=C HISTTIMEFORMAT='' builtin history 1)
    fi
    this_command="${this_command#*[[:digit:]][* ] }"
    ((++loop))
  done
}

time old
time new
time new2
$ bash-5.3-beta gh0167.benchmark.sh

real    0m18.610s
user    0m9.430s
sys     0m12.251s

real    0m11.676s
user    0m4.243s
sys     0m7.507s

real    0m0.629s
user    0m0.578s
sys     0m0.046s

akinomyoga avatar Jan 13 '25 13:01 akinomyoga

As a proof point, Ghostty 1.1.0 shipped with this change to its bundled bash-preexec, and we haven't heard of any issues through the pre- or post-release period yet.

jparise avatar Feb 02 '25 15:02 jparise

As a proof point, Ghostty 1.1.0 shipped with this change to its bundled bash-preexec, and we haven't heard of any issues through the pre- or post-release period yet.

@rcaloras still no reports of any problems with this change. Would you consider merging it?

jparise avatar Jun 30 '25 14:06 jparise

@jparise thanks for all the contributions! @akinomyoga thank you for the nudges and reviews.

rcaloras avatar Aug 03 '25 03:08 rcaloras

Thanks!

akinomyoga avatar Aug 03 '25 03:08 akinomyoga