All internal references within Signals should be weak
This issue suggests that all the internal references in the Signal library should be weak references in order to avoid the potential for leaking objects, either temporarily or permanently.
Goal 1: Avoid leaking if we don't call unwatch
Given the code,
const a = Signal.State(0);
const b = Signal.Computed(() => a.get());
b.get();
const w = Signal.Watcher(() => {});
w.watch(b);
we end up with the following dependency graph,
graph TD;
root --> a & b & w
a --> b
b --> a
b --> l1(["() => a.get()"])
l1 --> a
w --> l2(["() => {}"])
w --> b
b --> w
if we no longer hold rooted references to w and b, we will leak these objects as long as there is a rooted reference to a and we have yet to call w.unwatch(b).
const a = Signal.State(0);
{
const b = Signal.Computed(() => a.get());
b.get();
const w = Signal.Watcher(() => {});
w.watch(b);
}
since there is a strong reference internally held from a to b once a watcher is attached,
graph TD;
root --> a
a --> b
b --> a
b --> l1(["() => a.get()"])
l1 --> a
w --> l2(["() => {}"])
w --> b
b --> w
If however, the back references from source to consumer that are created once a watcher is attached are made to be weak, we no longer have a strong, rooted reference to b and w,
graph TD;
root --> a
a -.-> b
b --> a
b --> l1(["() => a.get()"])
l1 --> a
w --> l2(["() => {}"])
w --> b
b -.-> w
allowing these to be GCed,
graph TD;
root --> a
Recommendation: The references from consumer to source (Watcher to Signal and Computed to Signal) should be made weak.
Goal 2: Avoid leaking if we don't call unwatch (part 2)
Given the code,
const w = Signal.Watcher(() => {
const p = w.getPending();
doAsync(() => doTheWork(p));
});
const b = Signal.Computed(() => getSignalA().get());
const d = Signal.Computed(() => getSignalC().get());
w.watch(b, d);
we have the following dependency graph (omitting the lambdas for simplicity),
graph TD;
w --> b --> a
w --> d --> c
root --> d & b & w
b -.-> w
d -.-> w
a -.-> b
c -.-> d
If we no longer hold rooted references to b or d, it is not possible for it to be GCed since w holds strong references to both,
const w = Signal.Watcher(() => {
const p = w.getPending();
doAsync(() => doTheWork(p));
});
const b = Signal.Computed(() => getSignalA().get());
{
const d = Signal.Computed(() => getSignalC().get());
w.watch(b, d);
}
d and c are still rooted despite not being reachable from the user's code,
graph TD;
w --> b --> a
w --> d --> c
root --> b & w
b -.-> w
d -.-> w
a -.-> b
c -.-> d
If the Watcher holds weak references to the Signals that it watches, then Computeds not reachable through user code can be GCed.
graph TD;
w -.-> b --> a
w -.-> d --> c
root --> b & w
b -.-> w
d -.-> w
a -.-> b
c -.-> d
Here both d and c can be collected,
graph TD;
w -.-> b --> a
root --> b & w
b -.-> w
a -.-> b
Recommendation: The references from Watcher to Signal should be made weak.
Goal 3: Avoid Computed-capture reference mismatch
Given the code below,
const i = Signal.State(1);
const arr = [
Signal.State("a"),
Signal.State("b"),
];
const c = Signal.Computed(() => arr[i.get()].get());
assert(c.get() === "b");
we get the following dependency graph,
graph TD;
root --> c & i & arr
c --> b
b -.-> c
c --> l(["() => arr[i.get()].get()"])
l --> arr
l --> i
arr --> b
arr --> a
If we then change the index and remove the last element of arr,
i.set(0);
arr.pop();
we continue to depend on b even though it is not reachable from user code
graph TD;
root --> c & i & arr
c <-.-> b
b -.-> c
c --> l(["() => arr[i.get()].get()"])
l --> arr
l --> i
arr --> a
b ~~~ a
and this is only fixed when c.get() is invoked and the internal dependencies within Signal are updated.
If all of the internal Signal references are made weak, we no longer have a strong, rooted reference to b and it can be collected.
graph TD;
root --> c & i & arr
b <-.-> c
c --> l(["() => arr[i.get()].get()"])
l --> arr
l --> i
arr --> a
Recommendation: The references from Computed to Signal should be made weak.
This is the last remaining internal Signal reference that was not weak.
Related issues:
- https://github.com/tc39/proposal-signals/issues/252
- https://github.com/tc39/proposal-signals/issues/156
- https://github.com/tc39/proposal-signals/issues/254
Signals are inherently hard to get right in this regard because there is no simple ownership-model that fully represents what data is actually needed.
If you have a chain x -> f -> g -> y, that is, y = g(f(x)), then it is possible that at some point only x and y remain directly referenced.
The problem is that both need to be referenced for the in-between states to be needed: when x becomes inaccessible, nothing will be able to change its value, which cascades through f and g down to y, which at this point becomes an inert container for a fixed value.
Inversely, if y becomes inaccessible, nothing can read its value anymore, so both g and f can only be updated but never directly read anymore.
Ideally, you want a system where both situations can be optimised for, creating a situation that the current API doesn't really allow for.
The best solution I've been able to come up with over the last few weeks (I happen to be working on a verey similar system as this right now, which I eventually hope to integrate with native signals), is to distinguish between computed and source signals:
- Downstream references can all be weak
- Upstream references to computed signals should be strong
- Upstream references to source signals can be weak, as they are (assumed) only accessed through strong refs
A computed signal could then, at any point, walk its dependency graph and cut off any inert branch to be collected, replacing it with a static value. With an inert branch being one where all its branches eventually lead to a collected weak ref, that is, all of its sources are no longer accessible.
The one thing I haven't quite figured out is where to put the branch pruning logic because, ideally, computed signals don't ever check their upstream unless they have been notified of an update. Introducing a separate signal for upstream signals going inert could be a possible fix, but would require a whole extra chunk of finalization logic that itself would need to keep track of all downstream states (weakly) so they can be notified in a finalization registry which, itself, would ideally need to be notified when the downstreams disappear so as to not collect long lists of dead weakrefs, and the whole thing just balloons in complexity.
The question is, are signals so short-lived that this extra complexity is worth implementing, or is it only wasting even more memory to keep track of long-lived data that might not get to be collected anyway?
Edit: I forgot to add one obvious problem that is more or less inherent to the way this proposal works right now: Making upstream references weak isn't actually possible right now, because computed signals don't directly reference their inputs, but instead hold a reference to a closure which then references all of its inputs. This would make the user responsible for managing these references.
So in summary, properly GC'd signals are effectively impossible without introducing massive changes to VMs.
@DarkWiiPlayer the worst thing with weak refs is - imho - that you never know how long they stay alive.
a piece of as anecdotal evidence: in one of my own signals libraries i did use strong refs upwards and weak refs downwards. now, i had a case where i dynamically created and dropped a lot of nodes in a short time. the garbage collector took its time to clean out weak refs to downstream nodes, so for a short while, subsequent changes to upstream signals lead to huge spikes of unnecessary computation. quite funny to debug - if you are too slow, the garbage collector "solves" your problem.
in short: be very careful with weak refs. they are not cleared out when the referenced object is no longer strongly referenced. the are cleared an arbitrary amount of time after the object is no longer strongly referenced. possibly never.
While this is true, it also isn't really fixed by just not cleaning things up at all until the entire chain can be collected, which would be the alternative here.
That being said, a way of manually disconnecting branches even before the GC can get to them so framework authors can at least address these problems if they do get spotted will be absolutely necessary.
you are right of course - one way or another you have to ensure the right nodes stay connected and the right node are disconnected.
to me, this is a dilemma i've spent quite some though on:
- the garbage collector cannot provide reliably information about node disconnects, so ideally i don't want to rely on it at all.
- the only alternative seems to be some explicit ownership scheme. doable for gui components which can own nodes and disconnect them when unmounted - but difficult for a general purpose library: the user certainly does not want to manage ownership manually and will get it wrong if forced to...
The problem is that both need to be referenced for the in-between states to be needed: when x becomes inaccessible, nothing will be able to change its value, which cascades through f and g down to y, which at this point becomes an inert container for a fixed value.
Inversely, if y becomes inaccessible, nothing can read its value anymore, so both g and f can only be updated but never directly read anymore.
One solution to this is splitting State up into a getter/setter pair (e.g. https://github.com/tc39/proposal-signals/issues/94)
const { get, set } = Signal.State(1);
then you would be able to detect a State is no longer settable if/when the set value is collected?
If the Signal proposal eventually gets accepted then its implementation most likely would be something faster than pure JavaScript (e.g. v8's Torque). So we can talk about weak references without needing to have them implemented in terms of WeakRef.
Of course implementing signals as a VM feature would allow for better implementations than a javascript implementation, but I think it is naive to expect VMs to completely rework their garbage collection to allow for weird composite references like what would be needed in this case.
At the end of the day, something that can't be done using weakref at all will probably have a hard time making it into browsers.
I just finished my own implementation of Signals in a way that allows for garbage collection from both sides. Linking it here as a proof of concept, but keep in mind that it's meant to be a separate project and doesn't closely follow this spec:
https://git.but.gay/darkwiiplayer/nyooom/src/commit/9579a76255b8ae029457e5e203bf25adb2a9b4fc/observable.js#L237
A diagram roughly representing the approach:
So in conclusion: It is definitely possible to implement signals without any memory management. Whether or not that's enough of a benefit to change the entire spec to allow for it, that's not for me to figure out.