WebKit icon indicating copy to clipboard operation
WebKit copied to clipboard

optimize Array.of and Array.from for cpu pipelining and branch prediction

Open evanwashere opened this issue 1 year ago • 1 comments

passes 262 webkit tests (Tools/Scripts/test262-runner --release, build first) todo: check cpu cycles and instructions count (needs bun build for @mitata/counters)

patch: https://gist.github.com/evanwashere/17778413aba8b1eec1adfe5c6ef04a69

before

clk: ~3.33 GHz
cpu: null
runtime: jsc (null)

benchmark                               avg (min … max) p75   p99    (min … top 1%)
------------------------------------------------------- -------------------------------
Array.of(512)                              1.63 µs/iter   1.67 µs   1.68 µs ▃▆▇▁▁▂▁▂▄█▂
Array.of(4096)                            10.47 µs/iter  10.49 µs  10.50 µs ▃▃▁▆▁▆▆▃█▆▆
Array.of(32768)                           88.37 µs/iter  93.33 µs 110.25 µs ▂▅█▃▂▁▁▁▄▁▁
Array.of(65536)                          155.96 µs/iter 157.58 µs 174.00 µs ▂██▇▄▃▂▁▁▁▁

------------------------------------------------------- -------------------------------
Array.from(512)                            1.02 µs/iter   1.02 µs   1.32 µs ▃█▂▁▁▁▁▁▁▁▁
Array.from(4096)                           6.75 µs/iter   6.77 µs   6.81 µs ▂▅█▇▄▇▄▂▅▄▂
Array.from(32768)                         53.83 µs/iter  53.92 µs  54.25 µs ▃▁▃█▆▃▃▁▁▃▃
Array.from(65536)                        110.81 µs/iter 112.46 µs 124.75 µs ▂▆█▇▇▃▂▁▁▁▁

------------------------------------------------------- -------------------------------
Array.from(arr 512)                      200.16 ns/iter 202.21 ns 221.45 ns ▁▄█▆▄▂▂▁▁▁▁
Array.from(arr 4096)                     764.40 ns/iter 763.08 ns 989.76 ns ▂█▃▁▁▁▁▁▁▁▁
Array.from(arr 32768)                      6.12 µs/iter   6.07 µs   6.66 µs █▆▂▁▁▁▁▁▁▁▂
Array.from(arr 65536)                     11.80 µs/iter  11.87 µs  11.87 µs ▂▁▁▁▁▁▁▁▁▄█

------------------------------------------------------- -------------------------------
Array.from(512, () => 1)                   2.56 µs/iter   2.56 µs   2.58 µs ▂▃▄▅▄█▃▃▃▂▂
Array.from(4096, () => 1)                 14.27 µs/iter  14.30 µs  14.34 µs ▃▁▁▃▆▁▁█▆▃▃
Array.from(32768, () => 1)               118.04 µs/iter 119.67 µs 132.92 µs ▁█▅▆▄▂▂▁▁▁▁
Array.from(65536, () => 1)               229.38 µs/iter 231.33 µs 246.58 µs ▁▂▆█▆▄▂▂▁▁▁

------------------------------------------------------- -------------------------------
Array.from(iterator 512)                   2.21 µs/iter   2.22 µs   2.27 µs ▃▆█▆▂▁▁▁▁▁▁
Array.from(iterator 4096)                 16.04 µs/iter  16.04 µs  16.06 µs █▅▁▅▅▁█▅▁▁▅
Array.from(iterator 32768)               140.09 µs/iter 141.79 µs 161.83 µs ▁▁▂▂█▃▂▁▁▁▁
Array.from(iterator 65536)               262.21 µs/iter 263.79 µs 310.17 µs ▁▁█▇▃▂▁▁▁▁▁

------------------------------------------------------- -------------------------------
Array.from(iterator 512, () => 1)          2.58 µs/iter   2.58 µs   2.81 µs ▂█▂▁▁▁▁▁▁▁▁
Array.from(iterator 4096, () => 1)        18.93 µs/iter  18.90 µs  18.92 µs ▃▃▆▆▁▁▃▁█▁▃
Array.from(iterator 32768, () => 1)      164.04 µs/iter 165.50 µs 178.83 µs ▁▁▁▄█▅▃▂▁▁▁
Array.from(iterator 65536, () => 1)      305.42 µs/iter 307.25 µs 326.67 µs ▁▃██▅▂▂▂▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(512)                      26.16 µs/iter  26.08 µs  26.34 µs ▃▆█▆▃▁▁▁▃▁▃
Array.fromAsync(4096)                    214.39 µs/iter 216.50 µs 231.75 µs ▂▂▆█▂▄▂▁▁▁▁
Array.fromAsync(32768)                     1.64 ms/iter   1.65 ms   1.68 ms ▁▅▇▅▃▆█▅▃▂▁
Array.fromAsync(65536)                     3.26 ms/iter   3.28 ms   3.30 ms ▂▅▆▄▃▂▂▄█▄▂

------------------------------------------------------- -------------------------------
Array.fromAsync(arr 512)                  81.31 µs/iter  82.42 µs 100.88 µs ▁█▅▄▂▂▁▁▁▁▁
Array.fromAsync(arr 4096)                559.67 µs/iter 564.25 µs 591.58 µs ▁▅█▆▅▃▃▂▁▁▁
Array.fromAsync(arr 32768)                 4.20 ms/iter   4.21 ms   4.30 ms ▁▄██▅▂▂▁▁▁▁
Array.fromAsync(arr 65536)                 8.35 ms/iter   8.37 ms   8.49 ms ▄▅██▆▃▁▂▂▁▂

------------------------------------------------------- -------------------------------
Array.fromAsync(512, () => 1)             56.75 µs/iter  57.21 µs  69.75 µs ▂█▂▂▂▁▁▁▁▁▁
Array.fromAsync(4096, () => 1)           417.08 µs/iter 420.08 µs 438.79 µs ▁▃██▆▄▃▂▂▁▁
Array.fromAsync(32768, () => 1)            3.24 ms/iter   3.24 ms   3.32 ms ▃▆▇█▄▂▁▁▁▁▁
Array.fromAsync(65536, () => 1)            6.45 ms/iter   6.47 ms   6.49 ms ▂▄█▄▄▄██▆▄▃

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512)             83.20 µs/iter  84.29 µs  97.50 µs ▁▁█▆▃▂▂▁▁▁▁
Array.fromAsync(iterator 4096)           571.32 µs/iter 575.21 µs 599.50 µs ▁▁▄█▆▄▃▂▂▁▁
Array.fromAsync(iterator 32768)            4.28 ms/iter   4.29 ms   4.34 ms ▂▄▆█▇▇▃▂▂▁▁
Array.fromAsync(iterator 65536)            8.53 ms/iter   8.55 ms   8.75 ms ▃▄█▅▂▂▂▁▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512, () => 1)   105.42 µs/iter 106.00 µs 121.00 µs ▁▁█▄▂▂▁▁▁▁▁
Array.fromAsync(iterator 4096, () => 1)  747.20 µs/iter 752.13 µs 778.75 µs ▁▁▁▃█▄▃▂▂▁▁
Array.fromAsync(iterator 32768, () => 1)   5.70 ms/iter   5.72 ms   5.79 ms ▁▃▅▆▇█▄▃▃▂▂
Array.fromAsync(iterator 65536, () => 1)  11.38 ms/iter  11.40 ms  11.65 ms ▃▃█▆▅▂▂▁▂▁▂

after

clk: ~3.41 GHz
cpu: null
runtime: jsc (null)

benchmark                               avg (min … max) p75   p99    (min … top 1%)
------------------------------------------------------- -------------------------------
Array.of(512)                              1.45 µs/iter   1.48 µs   1.51 µs ▂▅█▃▂▁▂▇▇▂▁
Array.of(4096)                             9.15 µs/iter   9.17 µs   9.18 µs ▅▁▃▃▆▃▅▅▅█▃
Array.of(32768)                           71.90 µs/iter  71.94 µs  72.43 µs ▃▆▃▁▆█▁▁▃▁▃
Array.of(65536)                          138.11 µs/iter 139.79 µs 154.50 µs ▁▁▂▇█▄▃▂▁▁▁

------------------------------------------------------- -------------------------------
Array.from(512)                          393.27 ns/iter 395.49 ns 479.08 ns ▃█▃▂▁▁▁▁▁▁▁
Array.from(4096)                           1.54 µs/iter   1.55 µs   1.59 µs ▁▃▆█▇▅▃▁▁▁▁
Array.from(32768)                         12.18 µs/iter  12.20 µs  12.39 µs ▂█▅▂▂▁▁▁▂▁▄
Array.from(65536)                         24.60 µs/iter  24.61 µs  24.66 µs ▃▁▁▃▆▃▁█▆▁▃

------------------------------------------------------- -------------------------------
Array.from(arr 512)                      192.91 ns/iter 194.31 ns 224.31 ns ▁▆█▄▂▁▁▁▁▁▁
Array.from(arr 4096)                     765.31 ns/iter 755.46 ns   1.02 µs ▂█▂▁▁▁▁▁▁▁▁
Array.from(arr 32768)                      5.59 µs/iter   5.82 µs   5.89 µs ▄▁▁▁▁▁▁▁▁█▂
Array.from(arr 65536)                     11.59 µs/iter  11.73 µs  11.74 µs ▂▁▃▁▁▁▁▁▂▄█

------------------------------------------------------- -------------------------------
Array.from(512, () => 1)                 411.13 ns/iter 412.54 ns 461.70 ns ▂█▆▃▁▁▁▁▁▁▁
Array.from(4096, () => 1)                  2.74 µs/iter   2.74 µs   2.85 µs ▄█▆▂▁▁▁▂▁▁▁
Array.from(32768, () => 1)                22.48 µs/iter  22.54 µs  22.64 µs ▃█▃▁▃▁▁▅▁▃▃
Array.from(65536, () => 1)                45.53 µs/iter  45.58 µs  45.78 µs ▅█▅▅█▅█▁▁▁▅

------------------------------------------------------- -------------------------------
Array.from(iterator 512)                   2.13 µs/iter   2.13 µs   2.34 µs ▄█▂▁▁▁▁▁▁▁▁
Array.from(iterator 4096)                 15.42 µs/iter  15.43 µs  15.45 µs ▅▅▅█▁▅▅▅▁▁█
Array.from(iterator 32768)               140.76 µs/iter 142.79 µs 158.58 µs ▁▁▂▁▂█▅▂▂▁▁
Array.from(iterator 65536)               257.83 µs/iter 260.17 µs 279.75 µs ▁▁▁▃█▅▃▂▂▁▁

------------------------------------------------------- -------------------------------
Array.from(iterator 512, () => 1)          2.06 µs/iter   2.06 µs   2.12 µs ▂▄█▄▃▁▁▁▁▁▁
Array.from(iterator 4096, () => 1)        14.86 µs/iter  14.88 µs  14.89 µs ▆▃▁▃█▁▁▁▁▃▆
Array.from(iterator 32768, () => 1)      133.87 µs/iter 136.42 µs 149.58 µs ▁▁▁▂▃▆█▄▂▁▁
Array.from(iterator 65536, () => 1)      246.52 µs/iter 248.83 µs 266.54 µs ▁▁▁▂▆█▄▂▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(512)                     579.35 ns/iter 581.24 ns 670.80 ns ▃█▄▂▁▁▁▁▁▁▁
Array.fromAsync(4096)                      1.88 µs/iter   1.88 µs   1.94 µs ▁▁▆█▇▃▁▂▁▁▁
Array.fromAsync(32768)                    13.24 µs/iter  13.25 µs  13.36 µs ▅▆▃█▁▃▁▁▁▁▃
Array.fromAsync(65536)                    26.89 µs/iter  26.84 µs  26.93 µs ▃▁▃█▃▃▆▃▁▁▃

------------------------------------------------------- -------------------------------
Array.fromAsync(arr 512)                  77.46 µs/iter  78.71 µs  93.96 µs ▁▁█▄▃▂▁▁▁▁▁
Array.fromAsync(arr 4096)                525.49 µs/iter 530.42 µs 554.63 µs ▁▁▇█▆▄▄▂▂▁▁
Array.fromAsync(arr 32768)                 3.89 ms/iter   3.90 ms   3.99 ms ▂▆█▅▃▂▂▂▂▁▁
Array.fromAsync(arr 65536)                 7.81 ms/iter   7.87 ms   7.98 ms ▃█▄▅▆▆▅▅▄▂▂

------------------------------------------------------- -------------------------------
Array.fromAsync(512, () => 1)             26.76 µs/iter  26.83 µs  27.03 µs ▃▁█▁▃▆▃▆▁▁▃
Array.fromAsync(4096, () => 1)           212.37 µs/iter 215.58 µs 229.96 µs ▁▂█▂▃▂▁▁▁▁▁
Array.fromAsync(32768, () => 1)            1.62 ms/iter   1.63 ms   1.66 ms ▂▃▆██▆▄▃▁▁▁
Array.fromAsync(65536, () => 1)            3.24 ms/iter   3.25 ms   3.41 ms ▃█▅▃▂▂▁▁▁▁▁

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512)             79.03 µs/iter  80.29 µs  92.04 µs ▁▁▆█▄▂▂▂▁▁▁
Array.fromAsync(iterator 4096)           538.00 µs/iter 542.71 µs 571.71 µs ▁▂▄█▇▄▃▂▂▁▁
Array.fromAsync(iterator 32768)            4.01 ms/iter   4.02 ms   4.10 ms ▂▆█▇▄▂▂▂▁▂▁
Array.fromAsync(iterator 65536)            8.05 ms/iter   8.11 ms   8.35 ms ▆█▆▅▇▇▃▃▃▂▁

------------------------------------------------------- -------------------------------
Array.fromAsync(iterator 512, () => 1)   106.25 µs/iter 107.38 µs 123.33 µs ▁▁▆█▃▂▂▁▁▁▁
Array.fromAsync(iterator 4096, () => 1)  745.55 µs/iter 751.38 µs 794.75 µs ▁▁▃█▅▄▂▂▁▁▁
Array.fromAsync(iterator 32768, () => 1)   5.71 ms/iter   5.73 ms   5.83 ms ▂▄█▇▆▃▃▂▂▂▂
Array.fromAsync(iterator 65536, () => 1)  11.51 ms/iter  11.57 ms  11.75 ms ▂▅▄▅▄█▆▄▃▁▂

evanwashere avatar Jan 04 '25 01:01 evanwashere

@constellation what do you think of having this upstreamed

evanwashere avatar Jan 04 '25 09:01 evanwashere