proposal-signals icon indicating copy to clipboard operation
proposal-signals copied to clipboard

Uncached Computed?

Open NullVoxPopuli opened this issue 1 year ago • 12 comments

In many circumstances, the overhead of maintaining a cache is less performant than calculating the values fresh each computation.

Example, if you all you need is a derived value at the edge of rendering:

let doubled = new State.Computed(() => someSignal.get() * 2);
doubled.get()

^ will be more expensive than

let doubled = () => someSignal.get() * 2

doubled()

in classes this would be the equivelent of a getter:

class Demo {
  get doubled() {
    return someSignal.get() * 2;
  }
}

Ember actually had started with cached-by-default-no-opt-out computed properties, and when we moved away from that we saw massive performance gains.


Todo:

  • [ ] what is the use case for participating in the reactive graph and not caching?
  • [ ] is the tradeoff vs a getter / plain function worth it?

NullVoxPopuli avatar Apr 07 '24 16:04 NullVoxPopuli

IMO the uncached version of this:

let doubled = new State.Computed(() => someSignal.get() * 2);

Is just this:

let double = () => someSignal.get() * 2;

I don't think there's anything that the proposal needs to spec in this regard.

fabiospampinato avatar Apr 07 '24 16:04 fabiospampinato

An uncached computed is still a signal so it can participate in the signal graph.

when we moved away from that we saw massive performance gains.

This is confusing since presumably the calculation is almost always slower than an identify check v. the previous value. Is the idea that the reduction in memory use / gc cost from not storing the previous value improved performance?

sorvell avatar Apr 07 '24 16:04 sorvell

I'm not the best person to ask about cacheless computed :sweat_smile:

In Ember, the move away from a cached-computed was that it turned out that most usages of a cached computed were so simple the overhead of participating in the reactive graph, memory usage, checking the previous value, etc, exceeded that of just calling a function.

This is likely tangential to the request for cacheless computed, and I probs should have omitted that context -- I'm still learning what these are used for.

I've updated the post to specify some TODOs for us to figure out

NullVoxPopuli avatar Apr 07 '24 16:04 NullVoxPopuli

Having uncached computeds is very useful specifically because for some derived values, rerunning it every time you access it is simply cheaper than propagating the change through the reactive graph.

Think c = () => a() + b() in Solid syntax — it is much cheaper to get the values of a and b and add them than to do the reactive graph traversal logic, marking c dirty when a or b change, etc. Essentially for anything where just redoing the calculation is cheaper than the reactive graph algorithm.

It is really good to have all reactive values implement the same interface, whether a Signal, Computed, or derived signal/uncached computed value, both in terms of DX (as in Solid’s “to access the value you call it as a function) and in terms of framework internals (as in Solid’s “if you pass the renderer a function, it treats it as a reactive value”)

What this implies, to me, would be that if the proposal offers an interface for signals there should be a cheap wrapper for uncached computeds/derived signals that implements that same interface. (In Solid this is just a function because all signals are functions.)

gbj avatar Apr 07 '24 19:04 gbj

What this implies, to me, would be that if the proposal offers an interface for signals there should be a cheap wrapper for uncached computeds/derived signals that implements that same interface.

Is this necessary for a proposal that is meant to be mainly used by framework authors though? As I understand it anyway.

fabiospampinato avatar Apr 07 '24 21:04 fabiospampinato

@fabiospampinato mentions that zero-arg functions are effectively uncached computeds. Because dependency tracking is contextual, it doesn't matter whether there's extra stack frames in between the consumer (current reactive context) and the signal being read.

alxhub avatar Apr 08 '24 22:04 alxhub

In my opinion, there are two reasons to support uncached computeds:

  1. If you just need to compute the value once and you're always going to compute again when the computed invalidates, the bookkeeping and memory overhead of caching is pure overhead for no benefit. You can see this highlighted in the situation where your computed returns a big JSX tree but only uses the computed itself as a re-render signal. This makes it possible to freely intermix standard signals with external reactivity. In this case, holding an extra reference to a large JSX data structure simply because we decided to couple validation to caching adds memory pressure and bookkeeping overhead for no reason.
  2. Sometimes you use a computed in the Signals design in a throwaway situation (e.g. to determine if the computed value has no dependencies, which would allow for certain optimizations). In this situation, the extra overhead of caching is pure overhead in both CPU and memory terms. In Starbeam, we have a lower-level way of directly answering this question, but in standard Signals, Computed is the lowest level way to answer questions about what happens inside a block of code.

Both of them amount to the same thing: there are very real scenarios that don't need caching, and where caching clearly creates extra overhead. In this sort of design, I find the argument "can't you just cache it anyway" to be fairly weak as a motivation for adding additional overhead to the design of the lowest-level available primitive for interacting with "tracking frames."

wycats avatar Apr 09 '24 21:04 wycats

I should add that several people have pressed me to identify any semantic problem with building in caching at the lowest level, and I haven't been able to. As far as I can tell, you can always throw away a computed when you're done with it.

In some cases, you might throw away the computed immediately (when using Computed as an introspection device), in which case bookkeeping is the main source of unnecessary overhead. In other cases, you might retain the computed until it invalidates (e.g. when creating a one-time Computed to avoid interop issues caused by stale closure problems). In this longer-lived situation, there is both extra bookkeeping overhead and extra memory pressure caused by unnecessarily holding onto an (arbitrarily large) value that will never be used again.

I would be persuaded if someone demonstrated that these sources of overhead are negligible in practice. That said, in a lowest-level design like Signal, I think it makes sense for us to keep to a minimal design until we're sure that the extra overhead is negligible.

The argument in favor of the higher level design comes down to "This low-level primitive would have a slightly simpler surface area if we coupled these concerns, and we think that smaller surface area is worth the potential cost in added overhead." If you think about it, that's a somewhat strange way to approach the design space :smile:

wycats avatar Apr 09 '24 21:04 wycats

@wycats Do you have a particular low-level design in mind here? I'm definitely sympathetic to these concerns, and I think we might well be able to find some lower-level more orthogonal primitives if we decided it was worth it.

Pieces I've previously been turning around, in this general area:

(1.) A minimal overhead way to observe the tracking effects of a piece of code, without creating a throwaway Computed. This might be as simple as:

Signal.withTracker: <R>(track: <T>(signal: State<T> | Computed<T>) => T, fn: () => R) => R

A version where the track callback returns void could also work; it's slightly weaker as it only gets to observe read signals, not intercept and possibly replace reads. The latter is useful for some sorts of transactions and some sorts of async handling.

(2.) "Expert nodes" (to borrow incremental's terminology) which subsume both Computed and Watcher. These would manually manage their dependencies like Watchers, and get to override any or all of:

  • What happens when a dependency becomes marked as might-have-changed?
  • What happens when a dependency definitively changes?
  • How to get the latest value, and is that value considered changed or unchanged from last time?

If you have introspection to iterate over your dependencies, this is enough to implement Computed (including its equals), enough to implement Watcher, and enough to implement various flavors of uncached or partially cached Computeds that have been proposed. It also lets you implement efficient unordered folds (like "count how many of these dependencies are true") by using the might-have-changed notifications to know which subset of dependencies to recheck without having to iterate over all of them.

Are we interested in exploring either of these directions? Or does the much simpler "acts like a function, has the API of a Signal" primitive cover the cases we're interested in, without any of these "more primitive" primitives being needed?

shaylew avatar Apr 21 '24 01:04 shaylew

IMO the uncached version of this:

let doubled = new State.Computed(() => someSignal.get() * 2);

Is just this:

let double = () => someSignal.get() * 2;

I don't think there's anything that the proposal needs to spec in this regard.

I thought this for a moment, but it’s very important to not overlook that the example does not participate in the graph. The benefit of a stateless (no cache) computed signal is the ability to control memory performance and still participate in the graph (stay on the control-flow aspect of signal graphs).

This is the reason why Flash computed are stateless by default (see @flash-js/core on NPM). Processing large state within the graph will not consume memory if a computed is stateless. This is useful when you want to use some computed/derived value from state in two separate code paths in your data pipeline without consuming more memory. There isn’t a clear way to propagate a derived value in more than one direction in your state graph when using an anonymous function without doubling compute time.

Of course, if a computed signal is used frequently making compute time more expensive then caching is available as an opt in strategy. Flash library proposes this as a type of computer signal called “reducers”. Not only is this useful for trading off time complexity for space complexity in your app, but it is also useful for other control flow mechanisms like batching for back pressure and other control-flow mechanisms that you couldn’t do with a single computed signal as proposed by this spec. A bit out of the scope of my main point, but relevant as it is a case in point for stateless computed signals.

samholmes avatar Apr 26 '24 03:04 samholmes

As an API design for uncached computeds: what if we keep it simple and allow { cached: false } in the options bag for the computed constructor? Would this miss out on some kind of low-level-ness?

littledan avatar Apr 29 '24 00:04 littledan

The use case I keep hitting for uncached computeds is for being able to watch a function that may depend on a mix of signals and mutable non-signal values. Such as a React function component or a Lit render() method. It is less about not-caching the return value, than it is about always re-evaluating the the computation.

In these cases when you access the computed, you need to always run the computation to get the latest result to render, whether or not any signals it has read previously have changed. To do this, I use a state signal holding a number and increment the number before each read.

The pattern looks something like:

let forcingUpdate = false;
const forceUpdateSignal = new Signal.State(0);
const computed = new Signal.Computed(() => {
  forceUpdateSignal.get();
  return fn();
});

const watcher = new Signal.subtle.Watcher(async () => {
  watcher.watch();
  if (forcingUpdate) {
    return;
  }
  await 0;
  doSomething();
});
watcher.watch(computed);

const doSomething = () => {
  forcingUpdate = true;
  forceUpdateSignal.set(forceUpdateSignal.get() + 1);
  forcingUpdate = false;
  computed.get();
};

const updateForNonSignalReasons = () => {
  doSomething();
});

So we can emulate a non-cached computed, but it requires this extra state signal and we have to be careful to differentiate watcher notifications that were caused by real signal usage changes from notifications that were caused by this forceUpdateSignal - and the watcher is notified twice on signal updates, and once unnecessary for a forced read outside of a signal update.

Uncached computed would definitely help:

const computed = new Signal.Computed(fn, {cached: false});

const watcher = new Signal.subtle.Watcher(async () => {
  watcher.watch();
  await 0;
  doSomething();
});
watcher.watch(computed);

const doSomething = () => {
  computed.get();
};

const updateForNonSignalReasons = () => {
  doSomething();
});

This is a pretty bit ergonomic improvement.

I had thought that it might be nice to eliminate the Computed entirely and allow a Watcher to watch a plain function - following the idea above that an uncached Computed is just a function - but we still need to be able to evaluate the function and have it update its sources when we do. That's much more ergonomic with uncached Computeds than with over-complicating the Watcher API.

justinfagnani avatar Jul 31 '24 17:07 justinfagnani