sd icon indicating copy to clipboard operation
sd copied to clipboard

Benchmark is wrong

Open qezz opened this issue 6 years ago • 3 comments

Issue

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

Experiment

Here is my run for a simple thing

➜  ~/tmp echo '{"hello": "world"}' > test.txt

➜  ~/tmp cat test.txt
{"hello": "world"}

➜  ~/tmp hyperfine 'sd "(\w+)" "$1$1" test.txt'
Benchmark #1: sd "(\w+)" "$1$1" test.txt
  Time (mean ± σ):       6.7 ms ±   1.1 ms    [User: 3.2 ms, System: 1.8 ms]
  Range (min … max):     5.6 ms …  12.2 ms    245 runs

➜  ~/tmp cat test.txt
{"": ""}

Please pay attention to the second cat output. This is the reason why almost every run of sd is so fast (except the first one) — it doesn't do anything but just reading the file.

The following command should be used to compete with sed:

hyperfine 'sd -p "(\w+)" "\$1\$1" test.txt > /dev/null'

Please note the escaped groups \$1 and the preview option -p

Experiment Results

Here are my results for a 120 MB file

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

➜  ~/tmp hyperfine \
'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: sed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):      5.724 s ±  0.056 s    [User: 5.489 s, System: 0.146 s]
  Range (min … max):    5.656 s …  5.849 s    10 runs

Benchmark #2: sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):      2.614 s ±  0.034 s    [User: 2.493 s, System: 0.084 s]
  Range (min … max):    2.569 s …  2.676 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.590 s ±  0.216 s    [User: 12.087 s, System: 0.303 s]
  Range (min … max):   12.403 s … 13.150 s    10 runs

Summary
  'sed 's:\(\w\+\):\1\1:g' dump.json >/dev/null' ran
    2.19 ± 0.04 times faster than 'sed -E "s:(\w+):\1\1:g" dump.json >/dev/null'
    4.82 ± 0.10 times faster than 'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'

➜  ~/tmp l dump.json
.rw-r--r--@ 120M sergey  2 Aug 22:20 dump.json

Thoughts

~Even if we fixed the benchmark, I do think that we are capped with pipe throughput.~

UPD: Ok, apparently pipe is not a problem.

Platform

MBP 2015, 2.7 GHz Intel Core i5

qezz avatar Aug 02 '19 19:08 qezz

So, an important update!

Even my benchmark above is broken - sed on mac is not the same as on Linux. Therefore, I switched to gsed

➜  ~/tmp hyperfine \
'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null' \
"gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null" \
'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null'
Benchmark #1: gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null
  Time (mean ± σ):     39.251 s ±  2.217 s    [User: 37.303 s, System: 0.765 s]
  Range (min … max):   37.511 s … 43.916 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null
  Time (mean ± σ):     37.544 s ±  0.723 s    [User: 36.282 s, System: 0.594 s]
  Range (min … max):   36.911 s … 38.991 s    10 runs

Benchmark #3: sd -p "(\w+)" "\$1\$1" dump.json >/dev/null
  Time (mean ± σ):     12.599 s ±  0.183 s    [User: 12.076 s, System: 0.307 s]
  Range (min … max):   12.430 s … 12.940 s    10 runs

Summary
  'sd -p "(\w+)" "\$1\$1" dump.json >/dev/null' ran
    2.98 ± 0.07 times faster than 'gsed 's:\(\w\+\):\1\1:g' dump.json >/dev/null'
    3.12 ± 0.18 times faster than 'gsed -E "s:(\w+):\1\1:g" dump.json >/dev/null'

sd is about 3 times faster, than gsed, but still is not like the advertised 11x.

qezz avatar Aug 02 '19 23:08 qezz

Hello @qezz

In benchmark, I've noticed that you weren't using -p option for sd to print everything to stdout. Nevertheless, sed-commands print everything to stdout.

The command for the benchmark was written before the -p option was introduced.

I'm glad you tried to replicate the results. I will investigate potential performance regressions as soon as I get some free time.

chmln avatar Aug 03 '19 01:08 chmln

I tried to replicate the results as well, even with the commit https://github.com/chmln/sd/commit/324fd1c132a5c63212e43497496bd106b9cb57b3 where the benchmarks were added to the README.md. But with no success, I can’t reach the advertised 11x either. As @qezz already mentioned, it seems like the benchmark is wrong:

Also, once sd "(\w+)" "$1$1" dump.json >/dev/null is performed, every word in file is deleted. This happens because $1 is replaced by shell with (empty string) and sd performs 'in-place' (or 'inline') replacement.

My benchmark for the commit https://github.com/chmln/sd/commit/324fd1c132a5c63212e43497496bd106b9cb57b3:

Benchmark #1: sed -i -E "s:(\w+):\1\1:g" dump.json
  Time (mean ± σ):      7.791 s ±  0.076 s    [User: 7.583 s, System: 0.166 s]
  Range (min … max):    7.723 s …  7.935 s    10 runs
 
Benchmark #2: sed -i 's:\(\w\+\):\1\1:g' dump.json
  Time (mean ± σ):      7.877 s ±  0.157 s    [User: 7.672 s, System: 0.160 s]
  Range (min … max):    7.712 s …  8.121 s    10 runs
 
Benchmark #3: sd -i "(\w+)" "\$1\$1" dump.json
  Time (mean ± σ):      4.292 s ±  0.040 s    [User: 3.983 s, System: 0.271 s]
  Range (min … max):    4.240 s …  4.372 s    10 runs
 
Summary
  'sd -i "(\w+)" "\$1\$1" dump.json' ran
    1.82 ± 0.02 times faster than 'sed -i -E "s:(\w+):\1\1:g" dump.json'
    1.84 ± 0.04 times faster than 'sed -i 's:\(\w\+\):\1\1:g' dump.json'

Linus789 avatar May 07 '21 16:05 Linus789