Immer perf appears to have gotten worse over time, based on updated benchmarks
Summary
I've put together some updated benchmarks for various versions of Immer and other immutable update libs vs a hand-written reducer, especially since the current docs at https://immerjs.github.io/immer/performance show benchmarks that were last run against Node 10 and much older versions of Immer.
Overall, it does appear that Immer is significantly slower than both hand-written reducers and mutative. It looks like the majority of that time is due to freezing, but it also appears that Immer's perf has gotten worse over time.
I know that Immer has a lot of logic for correctness. I'm not sure how much performance optimization can be wrung out of the current approach, but the results here do seem concerning, and I figured this was worth sharing for discussion.
Overview
It looks like Immer's perf got significantly worse starting with Immer 8. This is especially true with freezing turned on, but also even with freezing turned off. (This is admittedly a bit surprising given that Immer 8's changes were basically just turning on freezing by default, no other major logic changes.)
As an example, note this set of results for one benchmark case. The vanilla reducer is 11 microseconds. Immer is in the milliseconds range, and for both freezing and non-freezing Immer gets worse over time:
remove: vanilla (freeze: false) 11.22 µs/iter
remove: immer5 (freeze: false) 14.23 ms/iter
remove: immer6 (freeze: false) 13.90 ms/iter
remove: immer7 (freeze: false) 16.65 ms/iter
remove: immer8 (freeze: false) 27.72 ms/iter
remove: immer9 (freeze: false) 66.60 ms/iter
remove: immer10 (freeze: false) 68.61 ms/iter
remove: immer10Each (freeze: false) 70.92 ms/iter
remove: mutative (freeze: false) 36.86 ms/iter
remove: mutativeCompat (freeze: false) 36.44 ms/iter
remove: vanilla (freeze: true) 10.13 µs/iter
remove: immer5 (freeze: true) 11.74 ms/iter
remove: immer6 (freeze: true) 11.97 ms/iter
remove: immer7 (freeze: true) 12.14 ms/iter
remove: immer8 (freeze: true) 19.93 ms/iter
remove: immer9 (freeze: true) 35.90 ms/iter
remove: immer10 (freeze: true) 37.86 ms/iter
remove: immer10Each (freeze: true) 34.54 ms/iter
remove: mutative (freeze: true) 30.32 ms/iter
remove: mutativeCompat (freeze: true) 39.66 ms/iter
Background
There was some extensive discussion of Immer perf and freezing behavior over in the Redux Toolkit repo:
- https://github.com/reduxjs/redux-toolkit/issues/4793
In that issue, @gentlee set up both a vanilla hand-written reducer, and an RTK createSlice reducer with Immer, and compared them in four scenarios dealing with a nested large array (add, update, remove, concat + truncate).
The issue discussion noted that Immer appears to be significantly slower than hand-written - not just 2-3x, but 100x or more.
I've taken the sample scenarios from that repo, and created another new benchmark repo at https://github.com/markerikson/immer-perf-tests that improves on the benchmarking process in a few ways:
- Uses https://github.com/evanwashere/mitata to do fine-grained benchmarking with nanosecond-level result-per-run granularity
- drops all the Redux usage so we're testing just the reducer update times
- Tests against multiple versions of Immer (including a locally built copy of the WIP
faster-iteration-experimentbranch from #1120 ) - Also tests against https://github.com/unadlib/mutative (a similar immutable update lib that claims faster perf than Immer) and https://github.com/exuanbo/mutative-compat (a wrapper that tries to match Immer's API exactly)
- Runs the same tests for each scenario and library with and without freezing
I did run this on a few different Node versions and got roughly similar results each time. I also tried to run it in a browser, but ran into issues with Vite failing to load a nested dependency for the mitata benchmarking lib, so didn't get that working atm.
Detailed Output
Here's the output of a benchmark run:
clk: ~3.47 GHz
cpu: AMD Ryzen 7 5800H with Radeon Graphics
runtime: node 18.18.2 (x64-win32)
benchmark avg (min … max) p75 p99 (min … top 1%)
----------------------------------------------------- -------------------------------
add: vanilla (freeze: false) 27.24 µs/iter 28.45 µs █
(23.72 µs … 36.50 µs) 31.07 µs █▁███▁██▁▁█▁▁██▁▁▁▁▁█
add: immer5 (freeze: false) 584.93 µs/iter 682.10 µs █▂
(420.10 µs … 1.89 ms) 1.36 ms ██▅▃▂▂▂▃▃▂▂▂▂▁▂▁▁▁▁▁▁
add: immer6 (freeze: false) 497.48 µs/iter 616.50 µs █
(318.80 µs … 1.30 ms) 1.01 ms ▁▃█▃▃▂▂▂▂▂▂▃▂▂▁▁▁▁▁▁▁
add: immer7 (freeze: false) 652.56 µs/iter 638.40 µs █
(477.50 µs … 2.13 ms) 1.34 ms ▁▂█▃▂▂▁▁▁▁▂▁▂▂▁▁▁▁▁▁▁
add: immer8 (freeze: false) 3.63 ms/iter 4.12 ms █
(2.75 ms … 6.20 ms) 5.46 ms ▃██▇███▃▄▂▅▄▅▂▂▁▅▂▁▃▁
add: immer9 (freeze: false) 7.44 ms/iter 8.70 ms █ ▅ ▅
(5.70 ms … 11.57 ms) 10.80 ms ▃█████▆▃▃▃▃▁▆▇▁▇▂▃▁▂▄
add: immer10 (freeze: false) 7.19 ms/iter 8.06 ms ▂▄█▆▃ ▂
(5.56 ms … 11.55 ms) 11.25 ms ██████▄█▄▆▂▄▆▅▇▂▁▁▁▁▂
add: immer10Each (freeze: false) 8.74 ms/iter 10.50 ms █▅ ▃ ▃
(5.78 ms … 14.49 ms) 13.33 ms ▄████▄▃▆▃██▄▆██▃▃▄▄▃▃
add: mutative (freeze: false) 51.38 µs/iter 50.20 µs █
(24.70 µs … 792.30 µs) 422.70 µs ▆█▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
add: mutativeCompat (freeze: false) 49.22 µs/iter 48.00 µs █
(20.90 µs … 890.20 µs) 397.50 µs ▆█▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
add: vanilla (freeze: true) 36.32 µs/iter 39.74 µs █ ██ █
(30.24 µs … 46.45 µs) 41.64 µs █▁█▁█▁▁▁▁██▁▁▁▁▁▁█▁▁█
add: immer5 (freeze: true) 831.42 µs/iter 932.10 µs ▂ █▇
(423.70 µs … 2.29 ms) 1.72 ms █▇▄▅▅▄███▅▄▂▃▃▂▂▂▂▁▁▁
add: immer6 (freeze: true) 677.55 µs/iter 783.00 µs ▆ █
(355.40 µs … 2.01 ms) 1.52 ms ▆█▃▄▃▃██▄▅▃▃▂▂▂▁▁▁▁▁▁
add: immer7 (freeze: true) 946.48 µs/iter 1.15 ms █▂ ▂
(569.10 µs … 2.08 ms) 1.70 ms ██▆▆▅▆▅▄▆███▅▄▄▃▂▂▁▁▁
add: immer8 (freeze: true) 4.53 ms/iter 5.15 ms ▇▆ ▂ ▂▇█▆▃▇▃▂
(2.99 ms … 6.53 ms) 6.24 ms ███▄▂▅█▇█████████▇▅▄▂
add: immer9 (freeze: true) 10.56 ms/iter 11.58 ms ▆▂▄▄ ▆ █
(7.32 ms … 13.09 ms) 13.05 ms ▃▇▁▅▁▃▅▃████▅█▅█▅▇▅▃▅
add: immer10 (freeze: true) 9.37 ms/iter 10.48 ms ▄█ ▄
(5.99 ms … 12.68 ms) 12.54 ms ▃▇▇▃▇▃▇█▇▅███████▅▃▁▃
add: immer10Each (freeze: true) 7.66 ms/iter 9.19 ms ▄█▃
(5.88 ms … 11.36 ms) 11.12 ms ████▆▂▁▄▂▂▃▆▁▄▆▆▁▅▂▂▃
add: mutative (freeze: true) 43.90 µs/iter 48.69 µs █ █
(35.39 µs … 54.02 µs) 49.92 µs █▁██▁▁▁▁▁█▁▁▁██▁▁▁███
add: mutativeCompat (freeze: true) 718.40 µs/iter 884.40 µs ▆█
(528.20 µs … 2.17 ms) 1.44 ms ██▅▄▂▁▂▄▃▃▅▄▃▁▁▁▁▁▁▁▁
summary
add: vanilla (freeze: false)
1.33x faster than add: vanilla (freeze: true)
1.61x faster than add: mutative (freeze: true)
1.81x faster than add: mutativeCompat (freeze: false)
1.89x faster than add: mutative (freeze: false)
18.27x faster than add: immer6 (freeze: false)
21.48x faster than add: immer5 (freeze: false)
23.96x faster than add: immer7 (freeze: false)
24.88x faster than add: immer6 (freeze: true)
26.38x faster than add: mutativeCompat (freeze: true)
30.53x faster than add: immer5 (freeze: true)
34.75x faster than add: immer7 (freeze: true)
133.19x faster than add: immer8 (freeze: false)
166.19x faster than add: immer8 (freeze: true)
263.92x faster than add: immer10 (freeze: false)
273.09x faster than add: immer9 (freeze: false)
281.16x faster than add: immer10Each (freeze: true)
320.97x faster than add: immer10Each (freeze: false)
343.94x faster than add: immer10 (freeze: true)
387.65x faster than add: immer9 (freeze: true)
----------------------------------------------------- -------------------------------
remove: vanilla (freeze: false) 11.22 µs/iter 11.47 µs ███ █ █ ██ █ █
(10.02 µs … 12.91 µs) 12.64 µs ███▁█▁█▁▁▁██▁▁▁█▁▁▁▁█
remove: immer5 (freeze: false) 14.23 ms/iter 16.36 ms ▃ █ ▃▃▃▃ ▃███ █ █▃ ▃
(9.33 ms … 18.54 ms) 18.53 ms █▆▆█▆████▁████▆█▆██▆█
remove: immer6 (freeze: false) 13.90 ms/iter 16.01 ms █ ▄▄█ ▄ ▄ ▄
(10.00 ms … 19.58 ms) 18.37 ms ███████▁▁██▅▅▅█▅██▅█▅
remove: immer7 (freeze: false) 16.65 ms/iter 18.06 ms ▂█
(14.17 ms … 23.15 ms) 21.78 ms ▃██▇▅▁▇▃▃▃▃▁▇▁▅▅▁▁▁▁▃
remove: immer8 (freeze: false) 27.72 ms/iter 29.06 ms █ ▃ █▃
(23.61 ms … 35.63 ms) 33.67 ms ▆█▆▆▁█▆▆▆▁▁██▁▁▁▁▁▆▁▆
remove: immer9 (freeze: false) 66.60 ms/iter 74.04 ms ▃ █
(54.38 ms … 75.73 ms) 74.20 ms ▆▆▁▁▁▁▁█▁▁▁▆▆▆▁▁▁▆▁▁█
remove: immer10 (freeze: false) 68.61 ms/iter 72.11 ms ▃ █
(58.88 ms … 91.71 ms) 82.31 ms ▆▁▆█▁█▆▁▁▁▁▆▆▁▁▁▁▁▁▁▆
remove: immer10Each (freeze: false) 70.92 ms/iter 75.19 ms █
(58.99 ms … 88.05 ms) 87.04 ms ███▁███▁▁█▁▁█▁▁▁▁█▁▁█
remove: mutative (freeze: false) 36.86 ms/iter 38.51 ms █ █ █ █ █
(29.47 ms … 45.63 ms) 44.27 ms ██▁▁▁▁█▁██▁▁█▁▁▁▁█▁▁█
remove: mutativeCompat (freeze: false) 36.44 ms/iter 39.51 ms █ █
(31.01 ms … 45.29 ms) 43.54 ms █████▁█▁█▁▁▁▁▁█▁█▁█▁█
remove: vanilla (freeze: true) 10.13 µs/iter 10.09 µs █ █
(9.58 µs … 11.54 µs) 10.67 µs █▁▁██▁█▁██▁█▁▁▁▁▁▁▁▁█
remove: immer5 (freeze: true) 11.74 ms/iter 12.33 ms ▄█▄▆
(9.54 ms … 21.58 ms) 16.29 ms ▅█████▃▁▇▃▁▃▃▇▃▃▁▅▁▁▅
remove: immer6 (freeze: true) 11.97 ms/iter 12.58 ms ▂▂█ ▂
(9.31 ms … 18.54 ms) 18.41 ms ███████▅▅▅▁▁▃▁▁▅▁▃▃▃▃
remove: immer7 (freeze: true) 12.14 ms/iter 12.63 ms █
(10.09 ms … 16.64 ms) 15.33 ms ▆▄▆▆██▄██▃▆▁▃▃▁▁▃▃▃▄▆
remove: immer8 (freeze: true) 19.93 ms/iter 21.95 ms █
(16.63 ms … 25.42 ms) 24.85 ms █▆▄█▁▆▄▁▄▆▁▄▄▄▁▁█▆▁▁▄
remove: immer9 (freeze: true) 35.90 ms/iter 36.86 ms █
(31.76 ms … 42.27 ms) 41.24 ms ▅▁▁█▅▁▅▅▁▅▁▅▁▅▁▁▁▅▁▁▅
remove: immer10 (freeze: true) 37.86 ms/iter 41.62 ms ▃ █
(29.90 ms … 45.22 ms) 44.66 ms ▆▁▁▁█▁▆▁▁█▆▁▁▆▁▁▆▁▁▆▆
remove: immer10Each (freeze: true) 34.54 ms/iter 35.29 ms ▃▃ ▃ █
(29.71 ms … 45.91 ms) 42.98 ms ██▆▆▁▁█▁█▁▆▁▆▁▁▁▁▁▁▁▆
remove: mutative (freeze: true) 30.32 ms/iter 31.23 ms █ ██ █
(28.06 ms … 35.88 ms) 33.35 ms ████▁█▁█▁██▁█▁▁█▁█▁▁█
remove: mutativeCompat (freeze: true) 39.66 ms/iter 40.42 ms ██
(35.04 ms … 48.34 ms) 45.11 ms ████▁▁▁██▁▁█▁▁▁▁▁▁█▁█
summary
remove: vanilla (freeze: true)
1.11x faster than remove: vanilla (freeze: false)
1158.99x faster than remove: immer5 (freeze: true)
1182.45x faster than remove: immer6 (freeze: true)
1198.91x faster than remove: immer7 (freeze: true)
1373.2x faster than remove: immer6 (freeze: false)
1405.14x faster than remove: immer5 (freeze: false)
1644.16x faster than remove: immer7 (freeze: false)
1968.06x faster than remove: immer8 (freeze: true)
2738.04x faster than remove: immer8 (freeze: false)
2994.45x faster than remove: mutative (freeze: true)
3411.15x faster than remove: immer10Each (freeze: true)
3545.68x faster than remove: immer9 (freeze: true)
3599.07x faster than remove: mutativeCompat (freeze: false)
3640.7x faster than remove: mutative (freeze: false)
3739.4x faster than remove: immer10 (freeze: true)
3917.02x faster than remove: mutativeCompat (freeze: true)
6577.26x faster than remove: immer9 (freeze: false)
6775.93x faster than remove: immer10 (freeze: false)
7004.63x faster than remove: immer10Each (freeze: false)
----------------------------------------------------- -------------------------------
update: vanilla (freeze: false) 122.54 µs/iter 145.70 µs █
(49.90 µs … 1.51 ms) 274.20 µs ▃▁▁▁█▃▂▂▂▂▁▂▄▂▁▁▁▁▁▁▁
update: immer5 (freeze: false) 822.09 µs/iter 997.60 µs ▇█
(578.60 µs … 2.11 ms) 1.50 ms ██▅▄▄▅▃▅▅▅▅▅▄▃▂▁▁▂▁▁▁
update: immer6 (freeze: false) 735.52 µs/iter 921.40 µs █
(518.20 µs … 1.85 ms) 1.30 ms ▄██▃▃▂▂▂▂▂▄▃▄▃▃▃▁▂▁▁▁
update: immer7 (freeze: false) 650.78 µs/iter 714.10 µs █
(507.20 µs … 1.56 ms) 1.30 ms ██▃▂▂▂▁▁▂▂▁▂▂▃▂▂▁▁▁▁▁
update: immer8 (freeze: false) 4.62 ms/iter 5.16 ms █▆▄
(3.00 ms … 6.06 ms) 5.89 ms ▄█▄▃▃▃▅▄▄▂▆▆▄███▇▆▄▅▂
update: immer9 (freeze: false) 8.97 ms/iter 10.56 ms █▅
(6.00 ms … 11.30 ms) 11.29 ms ▆▆▄▆██▆▃▁▆█▆▄▃█▃███▆▆
update: immer10 (freeze: false) 7.71 ms/iter 9.20 ms ▇█ ▇ ▄
(5.89 ms … 10.66 ms) 10.38 ms ███▆█▇▁▆▃▆▄▆▄▃▆▆▄█▆▄▆
update: immer10Each (freeze: false) 7.25 ms/iter 8.39 ms █▆
(5.70 ms … 10.54 ms) 10.46 ms ██▅▆█▄▃▅▂▃▃▄▃▃▄▃▂▅▂▄▃
update: mutative (freeze: false) 30.31 µs/iter 21.20 µs ▅▃█
(10.80 µs … 2.96 ms) 71.60 µs ▄███▆▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁
update: mutativeCompat (freeze: false) 31.13 µs/iter 23.60 µs █ ▆
(11.90 µs … 3.04 ms) 70.20 µs ▅█▅██▅▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁
update: vanilla (freeze: true) 144.97 µs/iter 186.10 µs █
(96.80 µs … 769.10 µs) 296.00 µs █▄▃▂▂▂▂▂▂█▄▂▁▁▁▁▁▁▁▁▁
update: immer5 (freeze: true) 748.82 µs/iter 929.10 µs ██
(535.20 µs … 1.80 ms) 1.53 ms ██▇▄▃▃▂▂▃▄▄▄▃▂▂▁▁▁▂▁▁
update: immer6 (freeze: true) 761.00 µs/iter 964.80 µs ▂█
(506.10 µs … 1.43 ms) 1.27 ms ███▅▄▄▄▃▃▃▄▅▆▆▅▆▂▂▁▁▁
update: immer7 (freeze: true) 1.01 ms/iter 1.19 ms ▅▃ ▄█▆▆
(553.80 µs … 1.85 ms) 1.75 ms ██▃▄▃▃▃▅████▆▃▂▂▂▂▁▂▁
update: immer8 (freeze: true) 4.45 ms/iter 5.35 ms ▇▅ ▂ ▃ ▂ ▃ ██▅
(2.88 ms … 6.31 ms) 5.96 ms ████▇▇▃▂▄▇█▄███████▃▃
update: immer9 (freeze: true) 7.81 ms/iter 9.36 ms █▃▆
(5.79 ms … 11.39 ms) 11.36 ms ▇███▅█▄▅▂▁▄▄▅▂▆▄▅▇▄▂▄
update: immer10 (freeze: true) 7.70 ms/iter 9.21 ms █ ▂
(5.74 ms … 10.51 ms) 10.40 ms ██▇▄▇▄█▅▂▄▆▅▄▄▂▇▄█▅▆▂
update: immer10Each (freeze: true) 8.16 ms/iter 9.58 ms ▃█ ▃ ▅
(5.88 ms … 11.25 ms) 11.14 ms ██▆▄█▄▄█▆▆██▄▁██▄█▃▃▄
update: mutative (freeze: true) 29.61 µs/iter 21.20 µs █ ▆
(11.40 µs … 2.81 ms) 62.50 µs ▄█▅█▆▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁
update: mutativeCompat (freeze: true) 854.26 µs/iter 1.00 ms █
(471.50 µs … 3.36 ms) 2.83 ms █▅▃▄█▆▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
summary
update: mutative (freeze: true)
1.02x faster than update: mutative (freeze: false)
1.05x faster than update: mutativeCompat (freeze: false)
4.14x faster than update: vanilla (freeze: false)
4.9x faster than update: vanilla (freeze: true)
21.98x faster than update: immer7 (freeze: false)
24.84x faster than update: immer6 (freeze: false)
25.29x faster than update: immer5 (freeze: true)
25.7x faster than update: immer6 (freeze: true)
27.77x faster than update: immer5 (freeze: false)
28.85x faster than update: mutativeCompat (freeze: true)
34.27x faster than update: immer7 (freeze: true)
150.44x faster than update: immer8 (freeze: true)
156.03x faster than update: immer8 (freeze: false)
245.01x faster than update: immer10Each (freeze: false)
259.95x faster than update: immer10 (freeze: true)
260.42x faster than update: immer10 (freeze: false)
263.84x faster than update: immer9 (freeze: true)
275.52x faster than update: immer10Each (freeze: true)
302.96x faster than update: immer9 (freeze: false)
----------------------------------------------------- -------------------------------
concat: vanilla (freeze: false) 45.56 µs/iter 51.30 µs █▆ ▄
(27.80 µs … 593.90 µs) 114.70 µs ██▄▃▇██▄▃▂▂▂▁▁▁▁▁▁▁▁▁
concat: immer5 (freeze: false) 1.25 s/iter 1.31 s █
(1.13 s … 1.34 s) 1.33 s █▁█▁▁▁▁▁█▁████▁▁▁▁███
concat: immer6 (freeze: false) 45.49 µs/iter 47.68 µs █
(39.65 µs … 52.05 µs) 51.01 µs ██▁▁█▁▁▁██▁██▁█▁▁▁▁██
concat: immer7 (freeze: false) 51.85 µs/iter 53.53 µs █ █
(43.33 µs … 66.04 µs) 55.67 µs █▁▁▁▁█▁▁███▁▁█▁▁██▁▁█
concat: immer8 (freeze: false) 52.14 µs/iter 52.96 µs █ ▃ ▃
(44.06 µs … 58.98 µs) 56.71 µs ▆▁▁▁▁▆▁▁▁▁▁█▆█▆▁▁▁▁▁█
concat: immer9 (freeze: false) 59.84 µs/iter 67.30 µs █▅ ▃
(35.10 µs … 962.40 µs) 177.70 µs ██▅▅█▇▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁
concat: immer10 (freeze: false) 144.33 µs/iter 148.81 µs █
(133.44 µs … 155.07 µs) 152.00 µs ██▁▁█▁█▁▁▁▁█▁▁█▁██▁██
concat: immer10Each (freeze: false) 161.44 µs/iter 77.60 µs █
(32.50 µs … 9.75 ms) 5.91 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
concat: mutative (freeze: false) 71.01 µs/iter 63.00 µs █
(30.60 µs … 1.93 ms) 646.30 µs ▆█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
concat: mutativeCompat (freeze: false) 68.67 µs/iter 61.50 µs █
(30.80 µs … 1.95 ms) 577.10 µs ▇█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
concat: vanilla (freeze: true) 42.88 µs/iter 44.09 µs █ █
(36.20 µs … 49.88 µs) 49.65 µs █▁▁▁█▁███▁█▁██▁▁▁▁▁▁█
concat: immer5 (freeze: true) 1.47 s/iter 1.54 s █ █
(1.31 s … 1.74 s) 1.58 s █▁█▁▁▁██▁▁▁▁▁█▁█▁█▁██
concat: immer6 (freeze: true) 2.01 ms/iter 2.48 ms █▇ ▆
(1.28 ms … 3.07 ms) 2.95 ms ██▃▃▂▅▄▄▂▂▃▃▄▇██▆▄▅▃▂
concat: immer7 (freeze: true) 1.39 ms/iter 1.72 ms █▄ ▄▂
(879.50 µs … 2.88 ms) 2.36 ms ███▄▄▅▄▄▄▃▇██▄▅▂▂▃▂▂▁
concat: immer8 (freeze: true) 4.45 ms/iter 4.99 ms █ ▂
(3.37 ms … 6.85 ms) 6.82 ms ███▇███▆▄▆▂▄▄▆▄▂▄▁▂▁▂
concat: immer9 (freeze: true) 8.73 ms/iter 10.03 ms ▄██▂
(6.70 ms … 12.96 ms) 12.59 ms ████▃▆▇▁▆▆▇▄▄▁▃▃▄▃▇▄▄
concat: immer10 (freeze: true) 9.32 ms/iter 11.03 ms █▃
(6.93 ms … 12.90 ms) 12.82 ms ██▆█▄▄█▆█▃▃█▄▄▄▆▄█▃▃▆
concat: immer10Each (freeze: true) 9.35 ms/iter 10.59 ms ▂ █ ▂
(6.91 ms … 13.60 ms) 13.08 ms ▆██▇▄█▃▄█▄▇▄▇▃▃▇▄▃▃▄▃
concat: mutative (freeze: true) 59.13 µs/iter 57.30 µs █▃
(29.70 µs … 1.89 ms) 490.50 µs ██▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
concat: mutativeCompat (freeze: true) 1.02 ms/iter 1.15 ms █ ▆
(547.30 µs … 2.86 ms) 2.63 ms █▆▄▄███▃▂▃▂▁▁▁▂▁▂▂▁▁▁
summary
concat: vanilla (freeze: true)
1.06x faster than concat: immer6 (freeze: false)
1.06x faster than concat: vanilla (freeze: false)
1.21x faster than concat: immer7 (freeze: false)
1.22x faster than concat: immer8 (freeze: false)
1.38x faster than concat: mutative (freeze: true)
1.4x faster than concat: immer9 (freeze: false)
1.6x faster than concat: mutativeCompat (freeze: false)
1.66x faster than concat: mutative (freeze: false)
3.37x faster than concat: immer10 (freeze: false)
3.76x faster than concat: immer10Each (freeze: false)
23.73x faster than concat: mutativeCompat (freeze: true)
32.36x faster than concat: immer7 (freeze: true)
46.85x faster than concat: immer6 (freeze: true)
103.7x faster than concat: immer8 (freeze: true)
203.59x faster than concat: immer9 (freeze: true)
217.3x faster than concat: immer10 (freeze: true)
218.11x faster than concat: immer10Each (freeze: true)
29069.38x faster than concat: immer5 (freeze: false)
34212.63x faster than concat: immer5 (freeze: true)
Hey @markerikson,
Thanks! These findings are very interesting. I'll try to dig deeper here and see how the regressions were introduced, and thanks for setting up that repo!
Two high level thoughts jump to mind as to causes:
- The reflection method we used changed over time, mostly for correctness reasons around edge cases like non-enumerable, getters, or inherited fields. In my benchmarks it didn't change much, but yours might be more accurate (or V8 has changed meaningfully). However, this is for uncommon scenarios (especially icmw Redux), so it might be worth to introduce a "sloppy" mode where things would be faster.
- Freezing is expensive, but primarily done to eliminate branch traversals the next time is drafted. Originally immer didn't deeply prune, but now we do to find drafts that would otherwise accidentally stay around in a case like
draft.x = [draft.y](originally Immer didn't traverse into "new" objects coming from the "outside" like the new array here). However, I want to explore the option to "mark committed / final" instead of "revoke" the draft proxies. Because we know all proxies involved in a recipe, that means that we don't need to scan or rewrite the final tree, at the costs leaving proxies around in the final state. That shouldn't affect semantics, but in the debugger you'd see proxies.
I hope to explore both, but apologies upfront that it might take a while as we'll have a move coming up :)
Edit: related thought, I'm wondering if Redux would overall would be faster if you'd deep-freeze the event object before sending it into the immer reducer.
Hey! I tried to reproduce with the benchmark, but probably I need to do some more yalc / pnpm magic to make everything running: (yalc publish in the immer repo, and yalc update immer in the benchmark repo)
mweststrate@mweststrate-mbp immer-perf-tests % pnpm start
> [email protected] start /Users/mweststrate/Desktop/immer-perf-tests
> cross-env NODE_ENV=production node --expose-gc immutability-benchmarks.mjs
internal/process/esm_loader.js:74
internalBinding('errors').triggerUncaughtException(
^
Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/Users/mweststrate/Desktop/immer-perf-tests/node_modules/immer10Each/dist/immer.mjs' imported from /Users/mweststrate/Desktop/immer-perf-tests/immutability-benchmarks.mjs
@mweststrate Couple thoughts:
- I don't think I've ever used
yalc update, actually. I normally doyalc add my-lib && yarn installagain each time. - My example has a
yalc'd build from thefaster-iterationbranch already committed and imported asimmer10Each. Assuming you're trying to do a build off master, you'll probably want to delete the.yalcfolder, remove that entry frompackage.json, and remove theimmer10Eachentries from the benchmarks file, in order to then useyalcto do your own local builds
Hey! I'd be curious to look into the perf aspects of this myself, but I don't know enough about Immer's internals to really have an idea where to begin or what to try. Any suggestions?
Hey, Anyone news on this? This is critical for people using RTK.
@mweststrate Been trying to look into this further, and I think there's actually a flaw in my benchmarks.
The benchmark repo currently uses a Node ESM script that directly imports from each of the copies of Immer, and my run script is NODE_ENV=production node --expose-gc immutability-benchmarks.mjs.
However, that means that it's importing the plain immer.mjs files, which still have the process.env.NODE_ENV !== "production" checks embedded in them... including in hot spots like finalizeProperty. After doing some perf captures of the Node script using Chrome DevTools, it looks those process.env.NODE_ENV checks are actually very expensive and skewing the results. I temporarily tried converting my script to do CJS imports to get the precompiled CJS prod artifacts instead, and while the various versions of Immer are still significantly slower than vanilla JS, Immer 10 is now less bad and performs better than 8 or 9.
I'm going to try revamping my benchmarks setup to precompile and bundle the test suite with Vite so that I get proper numbers from the prod artifacts.
So, overall Immer still seems to have gotten slower over time, but it's not as bad as I first thought.
@mweststrate saw you just merged a bunch of outstanding PRs. Just wanted to check back - do you think you'll have time to look at this in the near future? I'm also happy to try to collaborate on it. I've still got my own local benchmarks work lying around, and I'd taken a stab at trying to port some of the iteration changes, but the nuances of Immer's behavior are definitely not something I'm comfortable with at this point.
For those watching this issue:
I spent the last couple days working to update my benchmarking setup, and then using Claude to help me analyze the performance of the various Immer versions over time. From there, it identified several hotspots and suggested changes. After applying those, I believe I've managed to improve Immer's perf by some meaningfully significant amount. The benchmark numbers are fuzzier and noisier than I'd like, so I'm not going to claim a specific % improvement, but with the changes applied the benchmark harness shows that the "v10Perf" branch takes up a much smaller percentage of samples in the results than the v10 branch.
I also ended up porting Immer's tests from Jest to Vitest as well.
PRs:
- #1162
- #1163
- #1164
Haven't yet tried running a build with those changes in an RTK application myself yet - that's next on my list.
Would love it if folks could try out a build from that branch and see if it's an improvement!
@markerikson sorry I've been out the loop for a while, so slowly working my way back in and catching up with everything. Thanks for picking this up, I'll try to give this a review later this week
Sure, thanks for the update! Also happy to find time to talk directly about the benchmarking setup, the perf tweaks so far, and what you had in mind for a potential "mark committed" rearchitecture.
"mark committed" rearchitecture.
Oh lol, I don't remember anymore 😂. I need to dig a bit deeper to recall that. I've just invited you to the project btw, that should make organising PRs etc a ton easier.
My last couple days worth of Immer work sessions have been focused on trying to rewrite the "finalization" logic.
Currently, Immer relies on a full nested tree traversal. For every property value, it tries to figure out "is this a draft? if so, did it change?" , and if so replace the updated plain value into the final result tree. It also does deep freezing and patch generation.
I looked at other libs like Mutative, Structura, and Limu. Mutative in particular tracks accessed fields and defines a cleanup callback for each one, so it doesn't have to recurse through the entire tree just to find updated values. (It also doesn't freeze by default, but has that option.)
If you look at perf profiles of Immer, it's the finalization step that seems to take the longest - tons of nested finalize() -> finalizeProperty() calls.
So, I've been attempting to port Mutative's finalization callback approach over to Immer's codebase to eliminate most of that overhead.
Spent a bunch of time going in circles trying to understand what Immer does and how the other libs work. Fortunately I've now spent enough time staring at Immer's internals to start to understand what it's actually doing, how it works, and what the moving pieces are.
When I left off yesterday, I had something like 210 / 225 baseline tests passing, but that was with the multi-variable "run tests with all permutation of options" behavior turned off (ie, it's only running the freeze: true, somethingElse: true combo).
I think I'm on the right track overall, both in terms of impl and potential perf. Tried running my benchmarks vs the still-WIP code and numbers seemed good, but in a couple cases almost suspiciously so (as in "is this actually doing the right work?").
So between that and the other smaller cleanup perf optimizations I did, I'm pretty hopeful this is going to result in Immer being much faster than it has been the last several versions.
No ETA on having this ready, but I'm actively focused on this right now whenever I have time free to work on it. I'll put up a draft once I think it's viable.
I think I have some good news:
- got all the baseline tests passing
- almost all the failing tests in the rest of the files are due to the patch system not being implemented yet
- the benchmark setup generally seems to show this branch's build as being mostly better than v10 (varying per scenario)
- and when I tried installing it in the RTKQ perf test app I've used for testing RTKQ improvements, with a "load 1000 RTKQ components" setup, it seems like knocked off about a third of the total scripting time (3300ms -> 2150ms)?
- and looking at all of the corresponding function times like
shallowCopyandreducer, those look like they dropped 40-50%
so this looks, as best as I can reasonably tell so far, like it's a pretty significant perf win.
The code is an utter and complete mess right now, so still absolutely not even ready for a draft PR. I'm at a conf for the next couple days, but I'll keep poking at this when I have time and hopefully get it cleaned up.
Here's the WIP branch if anyone wants to take a look / try building yourself. (I've added a bunch of debug logging - turn it off by setting const ENABLE_LOGGING = false in common.ts first)
- https://github.com/markerikson/immer/tree/feature/perf-alternate-architectures-3
I've continued going down the rabbit hole on this, and I now feel very confident this is going to be a significant overall improvement in perf!
First, I've gotten the "finalization callback" approach working and passing essentially all tests that don't use patches. I still need to reimplement the patch system, but my goal has been to prove out whether this architectural change is actually an improvement first, then figure out how to make patches work again.
I also did further analysis and figured out that the reason all of the Proxy-based immutable update libs are so slow for array ops like sort(), reverse(), and splice() is that those result in the Proxy get/set traps being executed for every affected index (overhead) and a Proxy draft created for every item (more overhead). In my benchmarks, that results in most of the libs being 1200x+ slower than vanilla immutable updates for some of those scenarios.
My proposed solution for this is to actually override the mutating array methods with custom handlers that mutate the WIP copy directly, mark the fields as accessed internally, and skip out on Proxy array index accesses while we're in a "bulk operation. I appear to have gotten this working, and the results are very impressive compared to the current Immer 10 results!
remove: vanilla (freeze: false) 9.19 µs/iter 9.35 µs █
(8.72 µs … 9.80 µs) 9.59 µs ▅ █ ▅
( 1.31 kb … 1.31 kb) 1.31 kb ▇▁▇█▁▁▁▁▁▁▇█▇▁▇█▁▁▁▁▇
remove: immer10 (freeze: false) 19.61 ms/iter 20.25 ms █ █
(18.17 ms … 21.04 ms) 20.88 ms █ █ █ █ ██ ███ ██
( 11.55 mb … 11.86 mb) 11.63 mb ████████████▁███▁▁███
remove: immer10Perf (freeze: false) 11.97 µs/iter 12.51 µs █
(11.11 µs … 13.22 µs) 13.07 µs █
( 1.31 kb … 1.45 kb) 1.33 kb █▁██▁▁▁▁█▁█▁▁▁█▁▁▁█▁█
remove: vanilla (freeze: true) 8.92 µs/iter 9.27 µs █
(8.21 µs … 10.17 µs) 9.95 µs █ █
( 1.31 kb … 1.31 kb) 1.31 kb █▁▁███▁████▁███▁▁▁▁▁█
remove: immer10 (freeze: true) 11.68 ms/iter 12.13 ms ▅█
(10.50 ms … 13.78 ms) 13.69 ms ▅▅██ ▇▇ ▇ ▅ ▂
( 5.64 mb … 7.18 mb) 5.73 mb ████▇▄▄██▇█▁█▁▄▁▄▁█▇▄
remove: immer10Perf (freeze: true) 310.42 µs/iter 307.50 µs █
(268.60 µs … 979.10 µs) 615.80 µs █▄
( 1.95 kb … 727.59 kb) 169.71 kb ███▃▂▂▂▂▁▁▁▂▂▁▁▁▁▁▁▁▁
summary
remove: vanilla (freeze: true)
1.03x faster than remove: vanilla (freeze: false)
1.34x faster than remove: immer10Perf (freeze: false)
34.8x faster than remove: immer10Perf (freeze: true)
1309.09x faster than remove: immer10 (freeze: true)
2198.78x faster than remove: immer10 (freeze: false)
remove-high: vanilla (freeze: false) 78.34 µs/iter 76.30 µs █
(63.20 µs … 725.90 µs) 152.10 µs ▂██▅
( 12.57 kb … 242.09 kb) 84.00 kb ████▄▃▃▂▂▃▃▂▂▁▁▁▁▁▁▁▁
remove-high: immer10 (freeze: false) 22.12 ms/iter 23.03 ms █
(18.95 ms … 28.38 ms) 26.53 ms ▂▂▇▂ ▂ ▇█
( 11.56 mb … 11.91 mb) 11.69 mb ▆▆▆████▆█▆██▆▁▁▁▁▁▆▁▆
remove-high: immer10Perf (freeze: false) 5.94 ms/iter 6.90 ms ▃▄█▆█
(4.36 ms … 9.18 ms) 8.88 ms █████ ▅ ▂▇
( 5.13 mb … 6.89 mb) 5.83 mb ▆████████▅▅██▅█▆█▅▃▃▃
remove-high: vanilla (freeze: true) 77.55 µs/iter 74.40 µs ▃█
(67.80 µs … 577.90 µs) 143.20 µs ██
( 78.48 kb … 126.49 kb) 83.67 kb ██▄▃▃▂▂▂▁▂▂▁▁▁▁▁▁▁▁▁▁
remove-high: immer10 (freeze: true) 12.10 ms/iter 12.88 ms █
(9.89 ms … 18.58 ms) 17.49 ms ▆█
( 5.56 mb … 6.01 mb) 5.75 mb ▇███▇▇▇█▅▇▁▁▅▅▃▃▃▁▁▁▅
remove-high: immer10Perf (freeze: true) 6.04 ms/iter 6.59 ms █
(4.70 ms … 9.24 ms) 8.83 ms ▅▅█▄▇▇▇ ▇▄
( 4.88 mb … 7.74 mb) 5.91 mb ███████████▇▇▄▄▄▁▄▃▃▃
summary
remove-high: vanilla (freeze: true)
1.01x faster than remove-high: vanilla (freeze: false)
76.64x faster than remove-high: immer10Perf (freeze: false)
77.85x faster than remove-high: immer10Perf (freeze: true)
156x faster than remove-high: immer10 (freeze: true)
285.18x faster than remove-high: immer10 (freeze: false)
Putting it all together, the current branch is anywhere from 20% to 90% faster depending on benchmark scenario, and it's an average of 50%+ faster than Immer 10!
┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario │ immer10 │ immer10Perf │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ remove │ 11.6ms │ 308.4µs │ +97.3% │
│ reverse-array │ 11.9ms │ 380.2µs │ +96.8% │
│ sortById-reverse │ 13.5ms │ 697.5µs │ +94.8% │
│ update-reuse │ 1.1s │ 306.4ms │ +70.9% │
│ mixed-sequence │ 992.5ms │ 329.1ms │ +66.8% │
│ update-multiple │ 897.8µs │ 324.8µs │ +63.8% │
│ update │ 750.3µs │ 295.9µs │ +60.6% │
│ concat │ 1.1ms │ 422.4µs │ +59.8% │
│ add │ 841.3µs │ 398.6µs │ +52.6% │
│ remove-high │ 12.0ms │ 6.2ms │ +48.1% │
│ update-high │ 8.6ms │ 5.0ms │ +41.8% │
│ remove-reuse │ 503.6ms │ 310.5ms │ +38.4% │
│ update-high-reuse │ 596.4ms │ 397.9ms │ +33.3% │
│ remove-high-reuse │ 474.6ms │ 360.4ms │ +24.1% │
│ filter │ 6.6ms │ 6.7ms │ -2.0% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘
✓ immer10Perf shows an average 56.5% performance improvement over immer10
If I re-run the RTK Query perf stress test example I showed earlier, overall scripting time dropped from 3300ms to 2650ms, and Immer-related timings appear to have dropped noticeably:
FWIW I haven't yet examined bundle sizes to see how much adding these extra array methods increases bundle size, nor have I tried to optimize their implementations. I'm still in "make it work and see how fast it is" mode.
In the process, I've also continued fleshing out the benchmarks script with more scenarios, as well as printing out some summary tables like the one I pasted above. I've also added several more Immer unit tests for various edge cases I've thought of in the process.
This is eventually going to have to turn into multiple new PRs to keep it all reviewable, and I don't know what this will mean in terms of release planning. I'll have to talk with @mweststrate about that once this is ready for review.
I've updated the WIP branch here:
- https://github.com/markerikson/immer/tree/feature/perf-alternate-architectures-3
it's still a mess, but folks ought to be able to take a look and build it if you want to play around.
I will probably take a stab at patch generation tonight so I can push this towards completion, then start extracting the various pieces and putting up PRs in the next few days.
Also I want to give a shoutout to the authors of the other libs like Mutative, Structura, and Limu - studying their sources and comparing implementations has been very informative!
This is awesome progress @markerikson!
I also did further analysis and figured out that the reason all of the Proxy-based immutable update libs are so slow for array ops like sort(), reverse(), and splice() is that those result in the Proxy get/set traps being executed for every affected index (overhead) and a Proxy draft created for every item (more overhead). In my benchmarks, that results in most of the libs being 1200x+ slower than vanilla immutable updates for some of those scenarios.
This makes complete sense, in MobX we already provided custom implementations for all core methods from the start for precisely this reason. I think this is the right approach, although it will likely the lib a bit chunkier.
@mweststrate yeah, locally I saw that the prod artifact went from 11K to 17K (including both the finalization changes and the array methods), but that was with a bunch of random debugging code left in. I also had initially implemented overrides for Array.fill/copyWithin(), and the initial array method implementations were fairly repetitive. Currently I've dropped those two methods on the grounds that they're rare, and did some byte-shaving for the others, and it's back down to about 15.2K. Might be more I can do, but that's what it is on the branch atm.
Obviously I hate doing anything that increases bundle size, but in this case it's a trade between more code to handle more cases, vs faster perf. One option would be to handle the array overrides as a plugin, same as Map/Set support, to let folks have the tradeoff between faster perf if needed vs default bundle size.
Only had a little more time to poke at this over the weekend. Currently about half the patch tests pass, and I was getting bogged down trying to figure out how to get the rest working.
I'm heading off on conference travel for the next two weeks, and I usually am able to spend a good portion of the travel time on OSS work. So, hoping to continue hammering on this during the trip.
Once I can get the patch logic reimplemented, this will probably turn into several PRs:
- Benchmark updates
- Additional tests
- An update to the existing perf PR with just the smaller low-hanging fruit optimizations
- The full finalization rewrite including patch handling changes
- The array overrides
Pulling those out will also let me rerun the benchmarks and try to quantify how much each set of changes helps overall.
Also happy to discuss how we'd want to approach releasing these. In one sense these could all be a minor 10.x release, because it's all internal, but I could understand considering the changes large enough that it might justify a major anyway.
I'm very happy to report that I spent a ton of time working on reimplementing Immer's patch system to work with the callback notification logic...
and that as of today the entire existing Immer test suite passes 100% ✅ !
The code is still a mess, but the functionality is there, and now I can clean this up and extract the pieces.
I'm also happy to say that this still appears to show pretty significant perf wins, around 45% faster than v10 overall:
┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario │ immer10 │ immer10Perf │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ reverse-array │ 210.3µs │ 22.5µs │ +89.3% │
│ remove │ 235.9µs │ 25.9µs │ +89.0% │
│ sortById-reverse │ 223.8µs │ 28.2µs │ +87.4% │
│ update-reuse │ 20.9ms │ 5.8ms │ +72.5% │
│ mixed-sequence │ 21.6ms │ 6.0ms │ +72.0% │
│ concat │ 186.5µs │ 73.1µs │ +60.8% │
│ rtkq-sequence │ 29.5ms │ 13.3ms │ +54.9% │
│ update-high │ 180.6µs │ 106.0µs │ +41.3% │
│ remove-reuse │ 10.5ms │ 6.2ms │ +41.2% │
│ remove-high │ 209.0µs │ 134.8µs │ +35.5% │
│ update-high-reuse │ 11.6ms │ 7.9ms │ +31.4% │
│ add │ 21.1µs │ 14.9µs │ +29.6% │
│ remove-high-reuse │ 9.4ms │ 7.1ms │ +24.5% │
│ update │ 21.8µs │ 18.8µs │ +13.9% │
│ update-multiple │ 46.5µs │ 40.5µs │ +13.0% │
│ filter │ 119.4µs │ 136.5µs │ -14.4% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘
✓ immer10Perf shows an average 46.4% performance improvement over immer10
Best as I can tell, for some of the benchmarks the bottleneck really is just the shallowCopy method that does return {...}.
Updated the branch with the latest WIP:
- https://github.com/markerikson/immer/tree/feature/perf-alternate-architectures-3
Bouncing around conferences atm, so I'll keep working on this in the next few days as I'm traveling.
The non-mutating array methods (filter, find, map, etc) all incur overhead due to the same creation of proxies for every value during iteration. I'm going to try doing a similar set of "bulk operation" logic where we just pass the raw values to the iteration callbacks (and assume the user won't mutate those during iteration, because if they do it's their own fault), and then proxify the returned array or value to be able to handle any attempts to mutate those afterwards. Assuming this works out, it ought to eliminate most of the overhead in those cases.
Took a stab at trying to override the non-mutating array methods, and unfortunately there's a lot more complexity than I expected in order to match the existing behavior and what I assume a user would expect:
- for
filter, you'd expect that mutatingfiltered[i].value = 123would also update the same object reference in the original array. That doesn't work if we only proxify the values in the returned filter result array. I tried doing a mini-proxy around thelatest(state)array to manually reimplementfilter(and other methods) and trigger proxy creation for any index where the predicate returnstrue, but that got pretty ugly. Did seem to work. - For
map/flatMap, the mapper could return the original item, in which case you'd also expect it to reflect mutations back to the original array. Theoretically the same would be true formap(item => item.nested), which gets even weirder. - for
find(), ideally we'd just wrap the result in a proxy, but same thing -const item = arr.find(); item.value = 123ought to work to update the array as a parent, and so should accessingarr[thatIndex].value = 123.
So it may still be doable, but I decided it wasn't worth pursuing further for now. I'm annoyed that filter seems to be a bit slower on this branch, but spending more time on it feels like a classic "perfect is the enemy of the good" situation. We can always revisit handling these methods later.
At this point I'm going to start extracting pieces out of this WIP branch and filing PRs. Will hopefully have those up over the next few hours.
Awright, I have some very good news!
Earlier I filed PRs to add more tests and benchmarks, and merged those:
- #1181
- #1182
and updated the "small but meaningful perf tweaks" PR:
- #1164
I then spent time digging into why filter() was 15-20% slower in this branch. Burned a bunch of time trying to get some line-level profiling info via pprof, and eventually did get some useful numbers. Also was able to restructure and consolidate most of the logic for updating a draft's parent properly, and that even fixed a couple bugs in the process, and all tests pass.
Out of curiosity, I tried pulling the earlier "non-mutating array methods" WIP out and revisited it... and turns out those changes actually fixed the problems I was having earlier! I then found that my overrides for find/findLast were somewhat inefficient and made some tweaks there too.
So, here's the current numbers, with both the mutating and non-mutating array overrides turned on:
┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario │ immer10 │ immer10Perf │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ filter │ 38.4µs │ 3.2µs │ +91.6% │
│ remove │ 116.1µs │ 12.0µs │ +89.7% │
│ sortById-reverse │ 121.3µs │ 15.0µs │ +87.7% │
│ reverse-array │ 110.8µs │ 13.7µs │ +87.7% │
│ remove-high │ 107.5µs │ 14.5µs │ +86.5% │
│ update-high │ 88.7µs │ 12.9µs │ +85.4% │
│ mixed-sequence │ 1.8ms │ 556.5µs │ +69.2% │
│ update-high-reuse │ 1.8ms │ 623.5µs │ +65.9% │
│ remove-reuse │ 1.8ms │ 621.1µs │ +65.8% │
│ remove-high-reuse │ 1.8ms │ 628.5µs │ +65.2% │
│ update-reuse │ 1.6ms │ 619.2µs │ +61.8% │
│ concat │ 107.6µs │ 47.7µs │ +55.7% │
│ rtkq-sequence │ 8.8ms │ 4.7ms │ +46.5% │
│ add │ 14.1µs │ 9.1µs │ +35.7% │
│ update-multiple │ 40.3µs │ 26.1µs │ +35.2% │
│ update │ 15.7µs │ 12.6µs │ +19.3% │
│ mapNested │ 116.7µs │ 129.1µs │ -10.6% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘
✓ immer10Perf shows an average 61.1% performance improvement over immer10
I think we have a winner! :) Positive improvement across the board for everything except array.map(), and that's not a very big regression. (I opted to leave it as-is due to the complexities of mapping over data and then still expecting to update it later.)
I gotta call it a night here. Traveling again the next few days, so I should have time to finish cleaning up the two major sets of architectural changes and get those up as PRs.
Did some more interesting investigations today.
I have an RTKQ benchmark app that mounts 1000 query-using components at once (causing 1000 immediate Redux actions as components subscribe, and then 1000 more actions over the next couple seconds as promises resolve and the cache is updated), and I eyeball the amount of time spent in Immer in that case. What I generally see is that there's about 3300ms of scripting, and 750ms of that is just Immer's shallowCopy, and specifically just the return {...base} line.
I tried a standalone similar script that mimics that update pattern and tried cranking up the number of items from 100 through 3000. There was a very linear increase in time per item:
Done in 16.8 ms (items: 100, avg: 0.168 ms / item)
Done in 227.3 ms (items: 400, avg: 0.568 ms / item)
Done in 1344.8 ms (items: 1000, avg: 1.345 ms / item)
Done in 4912.5 ms (items: 2000, avg: 2.456 ms / item)
Done in 14032.6 ms (items: 3000, avg: 4.678 ms / item)
For the 3000 item example, 8.5s of that was literally just return {...base}, which is... uh... not great.
I noted that Mutative handles copying plain objects with:
const copy: Record<string | symbol, any> = {};
Object.keys(original).forEach((key) => {
copy[key] = original[key];
});
Object.getOwnPropertySymbols(original).forEach((key) => {
if (propIsEnum.call(original, key)) {
copy[key] = original[key];
}
});
return copy;
Out of curiosity, I tried doing that instead...
and the 3000 item case dropped down to:
Done in 8078.7 ms (items: 3000, avg: 2.693 ms / item)
then I tried removing the symbols check and re-ran, and:
Done in 7022.8 ms (items: 3000, avg: 2.341 ms / item)
genuinely shocked. Never would have guessed that an Object.keys() + a manual forEach loop would somehow beat what ought to be a built-in and optimized code path. (also yes I tried out several variations of for..in, Object.keys() + a for loop, etc)
I also saw a similar improvement in my actual RTKQ example app, where total scripting time dropped to 2100ms, and shallowCopy specifically down to about 150ms.
With just the Object.keys().forEach() but no Object.getOwnPropertySymbols(), there's two specific sets Immer test cases that fail, but the rest pass.
It does get a bit weirder. I already have a benchmark scenario that mimics the RTKQ multiple updates pattern, but added another example that takes a 1000-key object and adds another key to it, repeated a few times. One variation keeps passing in the initial state (so it's always 1000 + 1), another reuses the result state (so it's 1000 + 10). This approach is also faster overall for the reuse case, but somehow -110% slower (twice as slow) for the "keep updating the initial state" case. No explanation for that yet.
So, overall I think this has some significant potential as a faster shallowCopy approach, but it needs some more benchmarking and investigation to figure out exactly what the behavior is.
I'll set this one aside and focus on getting the two big architectural PRs up so we can review and move ahead with those - I can circle back and investigate the shallow copying options later.
Awright, I'm very happy to report that I've filed the three primary perf optimization PRs!
- #1164
- #1183
- #1184
Combined, those give us anywhere from 30-90% speedups depending on update scenario, with an average of 57% faster than the current Immer 10 release!
✓ immer10Perf shows an average 57.4% performance improvement over immer10
┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario │ immer10 │ immer10Perf │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ reverse-array │ 117.5µs │ 14.4µs │ +87.7% │
│ sortById-reverse │ 128.0µs │ 17.7µs │ +86.2% │
│ remove-high │ 119.6µs │ 16.9µs │ +85.8% │
│ remove │ 128.4µs │ 20.9µs │ +83.7% │
│ update-high │ 89.7µs │ 14.7µs │ +83.7% │
│ filter │ 49.1µs │ 11.7µs │ +76.1% │
│ remove-reuse │ 4.1ms │ 1.2ms │ +69.5% │
│ remove-high-reuse │ 3.9ms │ 1.2ms │ +69.4% │
│ update-reuse │ 3.5ms │ 1.3ms │ +63.6% │
│ mixed-sequence │ 3.5ms │ 1.3ms │ +61.8% │
│ update-high-reuse │ 3.3ms │ 1.4ms │ +59.3% │
│ concat │ 117.1µs │ 51.2µs │ +56.3% │
│ rtkq-sequence │ 12.5ms │ 7.0ms │ +44.1% │
│ updateLargeObject │ 235.7µs │ 132.1µs │ +44.0% │
│ update-multiple │ 47.1µs │ 27.9µs │ +40.8% │
│ updateLargeObject-r │ 9.6ms │ 6.0ms │ +37.6% │
│ add │ 24.4µs │ 17.0µs │ +30.4% │
│ update │ 22.7µs │ 18.6µs │ +18.1% │
│ mapNested │ 120.0µs │ 128.8µs │ -7.3% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘
We will have to figure out what to do about the two architectural PRs and bundle sizes. Both the notification callback system and the array method overrides add about 1.5-2K apiece to the final bundle size in a built app, and I'm always very sensitive to increasing default bundle sizes. The array overrides will likely become a new Immer plugin so that you can opt-in to the extra couple K size to get the array perf improvements. With the callback system, it's up for debate whether the 5-7% improvement over just the small perf tweaks is worth adding the extra bundle size. @mweststrate and I will have to discuss that before anything gets merged.
We'll also have to discuss release strategies and versioning. These are all internal changes, so they could easily just become a minor version like Immer 10.2. However, the "strict / loose iteration" changes would technically be breaking and are also about 10% of the improvement totals, so there's a very good argument that flipping that as a default would require a major version. (We can of course ship the new option in a minor and tell users to opt-in via calling the setter in your app code.)
I still have a couple more things I want to investigate. Per above, I saw potentially significant perf gains from changing the shallowCopy implementation, but also confusingly different results in different benchmark setups. I need to investigate further and try to nail down what the actual changes are, if they only happen with certain object sizes, try in different JS engines, etc. That could be another improvement. Should be able to look into that shortly.
Overall, this has been a very fun project to work on, and I'm very excited about getting these perf wins out for everyone to benefit from!
Okay, the initial perf tweaks PR is out as https://github.com/immerjs/immer/releases/tag/v10.2.0 ! Note that this still defaults to strictIteration: true for compatibility, but you can now import and call setUseStrictIteration(false) to get that improvement.
Per PR review comments, the larger architectural changes will likely go out in an 11.0 major. I'm looking at some of the review comments now and will try to get those updated shortly.
I've put up a draft PR that has a potential change to shallowCopy to use Object.keys().forEach() instead of {...}. I'm not 100% convinced it's an improvement in enough cases, but I definitely see some cases where it's a drastic improvement for large objects (>1020 keys, definitely around 3000 keys). I also brain-dumped all my learnings - how I started investigating shallow copy implementations, what I've tried, that v8 has a "fast properties" mode that caps out at 1020 properties, and more:
- #1188
Updated the array method override PR to convert it into an optional plugin to save on default bundle size:
- #1184
As of that PR, the perf numbers look like this:
✓ immer10Perf shows an average 40.4% performance improvement over immer10
┌─────────────────────┬──────────────┬──────────────┬─────────────┐
│ Scenario │ immer10 │ immer10Perf │ Improvement │
├─────────────────────┼──────────────┼──────────────┼─────────────┤
│ reverse-array │ 250.7µs │ 65.5µs │ +73.9% │
│ remove-high │ 197.5µs │ 54.3µs │ +72.5% │
│ sortById-reverse │ 189.3µs │ 52.8µs │ +72.1% │
│ update-high │ 182.6µs │ 57.6µs │ +68.5% │
│ remove │ 188.8µs │ 73.8µs │ +60.9% │
│ update-high-reuse │ 9.6ms │ 4.3ms │ +54.7% │
│ mixed-sequence │ 8.2ms │ 3.8ms │ +54.5% │
│ update-reuse │ 12.4ms │ 5.7ms │ +53.7% │
│ remove-reuse │ 9.0ms │ 4.4ms │ +50.9% │
│ remove-high-reuse │ 9.3ms │ 4.7ms │ +49.9% │
│ filter │ 107.6µs │ 57.1µs │ +46.9% │
│ update-largeObject1 │ 14.1ms │ 8.8ms │ +37.8% │
│ concat │ 199.9µs │ 124.4µs │ +37.8% │
│ rtkq-sequence │ 28.4ms │ 19.7ms │ +30.5% │
│ update-largeObject2 │ 29.5ms │ 20.8ms │ +29.4% │
│ update-largeObject1 │ 691.9µs │ 542.1µs │ +21.6% │
│ update-largeObject2 │ 2.2ms │ 1.8ms │ +18.9% │
│ update-multiple │ 83.9µs │ 71.9µs │ +14.3% │
│ update │ 70.7µs │ 64.3µs │ +9.0% │
│ add │ 80.5µs │ 76.3µs │ +5.1% │
│ mapNested │ 195.8µs │ 225.8µs │ -15.3% │
└─────────────────────┴──────────────┴──────────────┴─────────────┘
This is down from the "+55%" I'd seen in earlier prototype branches - I think it's a mixture of adding more benchmark scenarios that are in the +20% range, and maybe a bit of additional overhead.
That said, doing a full CPU profile on an equivalent benchmarking script shows that the real overhead is the combination of deep freezing and shallow copying:
so this may be about the theoretical maximum improvement we can squeeze out of the current design constraints.
Excited to report I just released Redux Toolkit v2.10.0, which updates to the latest Immer 10.2 to pick up its immutable update perf improvements! Also did a bunch of additional internal RTK optimization and byte-shaving as well, so this release is free perf wins!
- https://github.com/reduxjs/redux-toolkit/releases/tag/v2.10.0