ohash icon indicating copy to clipboard operation
ohash copied to clipboard

perf(serialize): optimized sorting algorithm

Open zsilbi opened this issue 9 months ago • 2 comments

For smaller object keys and Setvalues with less than ~10 elements (which is quite common) we can use a simple insertion sort. Otherwise we fall back to the native sort().

Compared to target: #144

 ✓ test/benchmarks.bench.ts > benchmarks > serialize - presets > count:1, size:small 2190ms
     name                       hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · ohash @ d7f7f9d  2,734,883.45  0.0003  3.0094  0.0004  0.0003  0.0007  0.0008  0.0090  ±1.55%  1367442
   · ohash @ dev      4,263,019.50  0.0002  0.2779  0.0002  0.0002  0.0004  0.0005  0.0006  ±0.21%  2131510   fastest

 ✓ test/benchmarks.bench.ts > benchmarks > serialize - presets > count:256, size:small 1210ms
     name                    hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · ohash @ d7f7f9d  13,446.55  0.0647  0.4242  0.0744  0.0702  0.2418  0.2760  0.3444  ±0.89%     6727
   · ohash @ dev      18,352.54  0.0480  0.4818  0.0545  0.0513  0.2242  0.2515  0.3314  ±0.93%     9177   fastest

 BENCH  Summary

  ohash @ dev - test/benchmarks.bench.ts > benchmarks > serialize - presets > count:1, size:small
    1.56x faster than ohash @ d7f7f9d

  ohash @ dev - test/benchmarks.bench.ts > benchmarks > serialize - presets > count:256, size:small
    1.36x faster than ohash @ d7f7f9d
clk: ~5.31 GHz
cpu: AMD Ryzen 9 7950X 16-Core Processor
runtime: bun 1.2.5 (x64-linux)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
• count:1, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d              207.86 ns/iter 199.94 ns █▇
                    (190.37 ns … 560.65 ns) 427.13 ns ██
                    (  0.00  b … 396.00  b)   4.83  b ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

ohash @ dev                  170.54 ns/iter 160.53 ns █▇
                    (149.46 ns … 601.63 ns) 415.21 ns ██
                    (  0.00  b … 726.00  b)  11.54  b ██▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

summary
  ohash @ dev
   1.22x faster than ohash @ d7f7f9d

• count:256, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d               66.88 µs/iter  59.40 µs  █
                       (50.93 µs … 2.02 ms) 141.50 µs  █
                    (  0.00  b … 528.00 kb)  13.95 kb ▃█▅▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

ohash @ dev                   49.34 µs/iter  49.51 µs        █ █
                      (48.01 µs … 51.76 µs)  50.43 µs ▅  ▅   █▅█ ▅▅ ▅     ▅
                    (619.00  b …  13.08 kb)   2.48 kb █▁▁█▁▁▁███▁██▁█▁▁▁▁▁█

summary
  ohash @ dev
   1.36x faster than ohash @ d7f7f9d

zsilbi avatar Mar 09 '25 22:03 zsilbi

Do you recall if we have the same trade-off in v8 based runtimes / Node.js specifically?

pi0 avatar Mar 25 '25 20:03 pi0

Do you recall if we have the same trade-off in v8 based runtimes / Node.js specifically?

I ran this again on node with the latest benchmark from https://github.com/unjs/ohash/pull/142 (this includes est. instruction counts too) Also tested v20, and the difference is the same.

This is small object with 5 keys in random order.

clk: ~5.25 GHz
cpu: AMD Ryzen 9 7950X 16-Core Processor
runtime: node 23.9.0 (x64-linux)

benchmark                   avg (min … max) p75 / p99    (min … top 1%)
------------------------------------------- -------------------------------
• count:1, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d              260.16 ns/iter 260.28 ns  ▃█
                    (239.77 ns … 686.80 ns) 434.16 ns ▆██
                    (  1.78 kb …   2.09 kb)   1.84 kb ███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   4.16 ipc ( 95.84% cache)    0.64 branch misses
          1.37k cycles   5.69k instructions   73.30 c-refs    3.05 c-misses

ohash @ dev                  193.24 ns/iter 200.30 ns   █
                    (181.85 ns … 286.39 ns) 227.37 ns  ▅██
                    (787.89  b …   1.29 kb) 992.75  b ▃███▆▄▃▃▆▇▆▅▃▂▂▁▂▁▁▁▁
                   4.41 ipc ( 96.45% cache)    0.18 branch misses
          1.02k cycles   4.48k instructions   41.53 c-refs    1.48 c-misses

summary
  ohash @ dev
   1.35x faster than ohash @ d7f7f9d

• count:256, size:small
------------------------------------------- -------------------------------
ohash @ d7f7f9d               74.17 µs/iter  69.04 µs █▆
                     (63.98 µs … 459.12 µs) 215.41 µs ██
                    ( 14.24 kb … 779.95 kb) 461.35 kb ██▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   3.92 ipc ( 95.74% cache)  526.21 branch misses
        390.40k cycles   1.53M instructions  22.40k c-refs  953.59 c-misses

ohash @ dev                   57.23 µs/iter  54.75 µs █
                     (50.63 µs … 377.78 µs) 217.18 µs █
                    ( 13.18 kb … 450.00 kb) 239.15 kb █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                   4.17 ipc ( 96.48% cache)  367.38 branch misses
        299.92k cycles   1.25M instructions  13.98k c-refs  492.69 c-misses

summary
  ohash @ dev
   1.3x faster than ohash @ d7f7f9d

zsilbi avatar Mar 25 '25 21:03 zsilbi