Issue

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

Experiment

Here is my run for a simple thing

➜  ~/tmp echo '{"hello": "world"}' > test.txt

➜  ~/tmp cat test.txt
{"hello": "world"}

➜  ~/tmp hyperfine 'sd "(\w+)" "$1$1" test.txt'
Benchmark #1: sd "(\w+)" "$1$1" test.txt
  Time (mean ± σ):       6.7 ms ±   1.1 ms    [User: 3.2 ms, System: 1.8 ms]
  Range (min … max):     5.6 ms …  12.2 ms    245 runs

➜  ~/tmp cat test.txt
{"": ""}

Please pay attention to the second cat output. This is the reason why almost every run of sd is so fast (except the first one) — it doesn't do anything but just reading the file.

The following command should be used to compete with sed:

hyperfine 'sd -p "(\w+)" "\$1\$1" test.txt > /dev/null'

Please note the escaped groups \$1 and the preview option -p

Experiment Results

Here are my results for a 120 MB file

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

➜  ~/tmp hyperfine \
'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: sed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):      5.724 s ±  0.056 s    [User: 5.489 s, System: 0.146 s]
  Range (min … max):    5.656 s …  5.849 s    10 runs

Benchmark #2: sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):      2.614 s ±  0.034 s    [User: 2.493 s, System: 0.084 s]
  Range (min … max):    2.569 s …  2.676 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.590 s ±  0.216 s    [User: 12.087 s, System: 0.303 s]
  Range (min … max):   12.403 s … 13.150 s    10 runs

Summary
  'sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null' ran
    2.19 ± 0.04 times faster than 'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null'
    4.82 ± 0.10 times faster than 'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

Thoughts

~Even if we fixed the benchmark, I do think that we are capped with pipe throughput.~

UPD: Ok, apparently pipe is not a problem.

Platform

MBP 2015, 2.7 GHz Intel Core i5

Aug 02 '19 19:08 qezz

So, an important update!

Even my benchmark above is broken - sed on mac is not the same as on Linux. Therefore, I switched to gsed

➜  ~/tmp hyperfine \
'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):     39.251 s ±  2.217 s    [User: 37.303 s, System: 0.765 s]
  Range (min … max):   37.511 s … 43.916 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):     37.544 s ±  0.723 s    [User: 36.282 s, System: 0.594 s]
  Range (min … max):   36.911 s … 38.991 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.599 s ±  0.183 s    [User: 12.076 s, System: 0.307 s]
  Range (min … max):   12.430 s … 12.940 s    10 runs

Summary
  'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null' ran
    2.98 ± 0.07 times faster than 'gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null'
    3.12 ± 0.18 times faster than 'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null'

sd is about 3 times faster, than gsed, but still is not like the advertised 11x.

Aug 02 '19 23:08 qezz

Hello @qezz

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

The command for the benchmark was written before the -p option was introduced.

I'm glad you tried to replicate the results. I will investigate potential performance regressions as soon as I get some free time.

Aug 03 '19 01:08 chmln

I tried to replicate the results as well, even with the commit https://github.com/chmln/sd/commit/324fd1c132a5c63212e43497496bd106b9cb57b3 where the benchmarks were added to the README.md. But with no success, I can’t reach the advertised 11x either. As @qezz already mentioned, it seems like the benchmark is wrong:

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

My benchmark for the commit https://github.com/chmln/sd/commit/324fd1c132a5c63212e43497496bd106b9cb57b3:

Benchmark #1: sed -i -E "s:(\w+):\1\1:g" dump.json
  Time (mean ± σ):      7.791 s ±  0.076 s    [User: 7.583 s, System: 0.166 s]
  Range (min … max):    7.723 s …  7.935 s    10 runs
 
Benchmark #2: sed -i 's:\(\w\+\):\1\1:g' dump.json
  Time (mean ± σ):      7.877 s ±  0.157 s    [User: 7.672 s, System: 0.160 s]
  Range (min … max):    7.712 s …  8.121 s    10 runs
 
Benchmark #3: sd -i "(\w+)" "\$1\$1" dump.json
  Time (mean ± σ):      4.292 s ±  0.040 s    [User: 3.983 s, System: 0.271 s]
  Range (min … max):    4.240 s …  4.372 s    10 runs
 
Summary
  'sd -i "(\w+)" "\$1\$1" dump.json' ran
    1.82 ± 0.02 times faster than 'sed -i -E "s:(\w+):\1\1:g" dump.json'
    1.84 ± 0.04 times faster than 'sed -i 's:\(\w\+\):\1\1:g' dump.json'

May 07 '21 16:05 Linus789

sd
sd copied to clipboard

Benchmark is wrong

Issue

Experiment

Experiment Results

Thoughts

Platform

sd sd copied to clipboard

Benchmark is wrong

Issue

Experiment

Experiment Results

Thoughts

Platform

sd
sd copied to clipboard