signals icon indicating copy to clipboard operation
signals copied to clipboard

Improve performance by ~60%

Open eddyw opened this issue 2 years ago • 2 comments

Description

This PR adds ~80 bytes to the final bundle by refactoring sources (not targets) to use arrays instead of LinkedList. But it makes it up by improving performance by 60-69% 😅

TL'DR I want benchmarks

Initial benchmarks:

Signals Local - Simple x 19,296,752 ops/sec ±0.83% x 15 Bytes/op (928725 runs sampled)
Signals Local - Unchanged Sources x 84,583 ops/sec ±0.38% x 12 Bytes/op (23931 runs sampled)
Signals Local - Conditional Sources x 161,475 ops/sec ±0.59% x 1.27 KB/op (44140 runs sampled)

Signals  Core - Simple x 18,081,537 ops/sec ±0.46% x 15 Bytes/op (917966 runs sampled)
Signals  Core - Unchanged Sources x 52,064 ops/sec ±0.29% x 18 Bytes/op (14879 runs sampled)
Signals  Core - Conditional Sources x 95,471 ops/sec ±0.41% x 7.75 KB/op (26329 runs sampled)

Measure in all benchmarks in this function:

function measure() {
	count.value++;
	double.value;
}

Each benchmark sample runs in a separate process and uses the measure method above:

// Simple
const count = signal(0);
const double = computed(() => count.value * 2);
// Unchanged sources
const s = [];
for (let i = 0; i < 1000; i++) {
	s[i] = signal(i);
}
const count = s[500];
const double = computed(() => {
	for (let i = 0; i < 1000; i++) {
		s[i].value;
	}
	return count.value * 2;
});
// Conditional sources
const s = [];
for (let i = 0; i < 1000; i++) {
	s[i] = signal(i);
}
const count = s[500];
const double = computed(() => {
	const mod = count.value % 10 === 0;
	const from = mod ? 0 : 601;
	const size = mod ? 500 : 1000;

	for (let i = from; i < size; i++) {
		s[i].value;
	}
	return count.value * 2;
});

The code is documented explaining every decision but here is some more info.

Why?

It's faster to iterate over arrays than it is to iterate over a LinkedList. Pretty often, sources of a "computed" or "effect" remain the same:

const c = computed(() => list.value.filter(stuff));

Based on this idea, I implemented sources as how hooks work in React/Preact. Every node uses a slot/index in the sources array. The execution roughly looks like this:

sources = [node1, node2, node3]

prepareSources -> for each node._version = -1, set index = 0

addDependency node1 -> [node1, node2, node3]
                          ↑ node1._version = 0, index += 1
addDependency node2 -> [node1, node2, node3]
                                 ↑ node2._version = 0, index += 1
addDependency node3 -> [node1, node2, node3]
                                        ↑ node3._version = 0, index += 1

cleanupSources -> for each node, rollback node

This is the best case scenario when sources do not change after re-computation so all there operations are fast.

An advantage of using arrays is that the JS engine allocates a capacity only when it's needed and not every time a new item is added. This means that, in this particular example, while the length is 3, the array has in fact more capacity for new items. This means, there is already enough capacity for new sources if the computed/effect has conditionals.

This opens the possibility to re-use already allocated capacity to push old dependencies - that need to be cleaned up - at the end of the array:

sources = [node1, node2, node3]

prepareSources -> for each node._version = -1, set index = 0

addDependency node1 -> [node1, node2, node3]
                          ↑ node1._version = 0, index += 1
addDependency node4 -> [node1, node4, node3, node2]
                                 ↑             ↑ node2 scheduled for clean up
                                 │
                                new node, index += 1
addDependency node3 -> [node1, node4, node3, node2]
                                        ↑ node3._version = 0, index += 1

cleanupSources -> for each node, rollback node
               -> for each node from index to length, unsubscribe
               -> set length of array to 3
               -> [node1, node4, node3]

Even though the length is set to 3, the internal capacity of the array doesn't shrink. In case of V8, the array doesn't shrink for small arrays or it shrinks if 50% of the capacity is unused for large arrays.

This greatly improves performance for most common cases when sources do not change but also improves performance for computed/effect with conditionals when the number of new sources + cleanup nodes doesn't exceed the internal already allocated capacity of the array or if the number of sources decreases by no more than 50% (for small arrays is makes no difference, capacity won't shrink)

Next

This is strictly speaking about V8 but from my understanding similar optimizations are made in JavaScriptCore and SpiderMonkey. The benchmarks were run in Node.js (V8) so I still need to benchmark how this looks like in other JS engines. 😅

Opening this PR for discussion meanwhile 😄

Edit: benchmarking with v8 and spidermonkey binaries:

v8
====================================
! Signals  Core - Simple x 22,009,361 ops/sec ±0.69% (5214020 runs sampled)
Signals  Core - Unchanged Sources x 60,536 ops/sec ±0.31% (17772 runs sampled)
Signals  Core - Conditional Sources x 91,436 ops/sec ±0.30% (26858 runs sampled)
Signals  Core - Computed + Effect x 58,906 ops/sec ±0.31% (17266 runs sampled)

Signals Local - Simple x 19,180,400 ops/sec ±0.37% (4539201 runs sampled)
! Signals Local - Unchanged Sources x 83,790 ops/sec ±0.28% (24567 runs sampled)
! Signals Local - Conditional Sources x 156,461 ops/sec ±0.28% (45920 runs sampled)
! Signals Local - Computed + Effect x 59,025 ops/sec ±0.31% (17344 runs sampled)

SpiderMonkey
====================================
! Signals  Core - Simple x 10,018,443 ops/sec ±0.12% (2692485 runs sampled)
Signals  Core - Unchanged Sources x 25,441 ops/sec ±0.33% (7479 runs sampled)
Signals  Core - Conditional Sources x 50,539 ops/sec ±0.30% (14892 runs sampled)
! Signals  Core - Computed + Effect x 20,169 ops/sec ±0.34% (5921 runs sampled)

Signals Local - Simple x 9,264,110 ops/sec ±0.11% (2479468 runs sampled)
! Signals Local - Unchanged Sources x 35,002 ops/sec ±0.31% (10272 runs sampled)
! Signals Local - Conditional Sources x 74,420 ops/sec ±0.26% (21852 runs sampled)
Signals Local - Computed + Effect x 15,839 ops/sec ±0.42% (4651 runs sampled)

"Computed + Effect" is the same as "Unchanged Sources" but has an effect that always reads the computed value.

Also, these benchmarks use performance.now (converted to ns) and not process.hrtime.bigint because neither v8 not SM have a process.hrtime so there is less accuracy. Overall, it seems the biggest improvement in performance is seen when using conditionals (different sources on every re-compute).

eddyw avatar Oct 25 '22 16:10 eddyw

⚠️ No Changeset found

Latest commit: b790c220551f43f1628d3eed03a5f06eecb08151

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

changeset-bot[bot] avatar Oct 25 '22 16:10 changeset-bot[bot]

Deploy Preview for preact-signals-demo ready!

Name Link
Latest commit b790c220551f43f1628d3eed03a5f06eecb08151
Latest deploy log https://app.netlify.com/sites/preact-signals-demo/deploys/63599048fb414c0009c931fb
Deploy Preview https://deploy-preview-252--preact-signals-demo.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

netlify[bot] avatar Oct 25 '22 16:10 netlify[bot]

Closing in favor of #265

After fixing a couple of issues I haven't considered, the performance isn't that impressive anymore (at least to justify the +80 bytes) lol - PR #265 performs better and it's more stable across different use case than the changes in this PR e.g: the simple case benchmark performs slightly worst here and most benefits seem to be seen only with big arrays while PR #265 performance gains can be seen in simple cases as well as computed/effect with thousand sources.

eddyw avatar Nov 06 '22 10:11 eddyw