proposal-signals icon indicating copy to clipboard operation
proposal-signals copied to clipboard

Do sources/sinks need Set semantics?

Open shaylew opened this issue 2 years ago • 8 comments
trafficstars

The current spec text says we need to store sources/sinks as an ordered set, but do we know for sure that's necessary? I'm not so sure.

Sinks: these need to be ordered to ensure a predictable order of calling notify, but I think it's not actually observable whether or not they're deduplicated:

  • we only call notify once on a node until the next time someone calls get on it and makes it clean again
  • during the notification pass, caling get is forbidden So it seems like even if we were to store the sink twice, we'd be guaranteed to skip over it the second time.

Sources: these need to be ordered to ensure a predictable order of calling equals and the bodies of Computeds. In normal scenarios you can't observe whether or not they're deduplicated, because they'll become clean after the first time they're reached (at which point neither their body nor equals needs to run). But there's an exception here, depending on some details we haven't nailed down yet:

  • Computed C depends on (in order) computeds [A, B, A]. A is clean; B is dirty.
  • recomputing B causes a set to a state that A (directly or transitively) depends on, but B's equals prevents B from becoming dirty so it doesn't interrupt checking.
  • if A appears twice in C's sources, we'll skip it the first time (it's clean), dirty it when recomputing B, and then recompute A (and then C, if A's equals doesn't cut off the computation). If A appears only once, we'll skip over it and then not recompute A (or C).

Note that this sort of set presents some consistency problems: it seems that we must rerun A and (depending on A's equals) rerun C, even if C only read [A, B], so it's not clear the deduplication is really implicated here. If we don't rerun them, we end up in a situation where we finish checking C's sources and decide that it's clean, even after a change which would cause its from-scratch value to be something different!

So it sounds to me like we really want to pin down which States are allowed to be set when, and it's possible that a satisfying solution there will cause both sources and sinks to have the same observable behavior whether or not they're deduplicated.

shaylew avatar Oct 30 '23 05:10 shaylew

Thanks for this writeup! Overall, deduplicating is used there to help implementations pass stress tests which do a lot of duplicate gets. However, the specification just states what the observable algorithm is, so if it turns out that it is fine to duplicate, then implementations may do so.

It’s great to think through the implications here so we can have more detailed advice to implementers. Maybe we can ultimately put that in a note in the spec.

recomputing B causes a set to a state that A (directly or transitively) depends on, but B's equals prevents B from becoming dirty so it doesn't interrupt checking.

I think the current direction is to ban all sets inside of all computeds, so fortunately this shouldn’t be as much of an issue.

littledan avatar Oct 30 '23 11:10 littledan

We're ultimately not banning sets inside of computeds, so this scenario can still occur. How bad is this situation for implementations? It would be a shame if we couldn't deduplicate at all...

littledan avatar Dec 12 '23 18:12 littledan

I think we have to figure this out. It doesn't quite block prototyping, but it does block nailing down all the fine details precisely to ensure compatibility.

littledan avatar Dec 26 '23 17:12 littledan

I think we want to first nail down exactly what we want a set inside a computed to do, and then we'll be able to figure out which consequences follow from that.

(I'm not super optimistic we'll find a concise/coherent description of what a set inside a computed should mean, but set vs. list dependencies isn't the biggest problem there; it's invalidation during checking and invalidation during evaluation that are thornier.)

shaylew avatar Dec 26 '23 21:12 shaylew

@shaylew What do you want it to do?

littledan avatar Dec 26 '23 22:12 littledan

OK, rereading the above, I see how this is about us allowing the user to break the rule "you can't write to something that you already read", and that inevitably leads to incoherency.

If we don't rerun them, we end up in a situation where we finish checking C's sources and decide that it's clean, even after a change which would cause its from-scratch value to be something different!

But there's no right answer here, right? Even if we do recompute A, C might come out wrong with respect to if a clean computation were done later!

One possible answer to preserve the integrity of the graph, if not the overall coherence: if there is a set inside of a computed to something which was previously read inside the same computed, then that is secretly queued up and executed once the computed finishes.

littledan avatar Dec 27 '23 14:12 littledan

As far as I know knowing that a signal got read N times inside a tracking scope, for N > 1, has zero utility, so keeping that information around seems wasteful. Like you can technically even run out of memory because of this, and running out of memory for useless information doesn't seem ideal.

In general though one may not necessarily want to use actual sets for this either, as most tracking scopes will be tracking 1 or 2 signals, some will even be tracking nothing, and in these cases just a linear scan would probably perform better. In practice for me on the JS side of things up to something like 512 items a linear scan was faster, iirc.

fabiospampinato avatar Mar 31 '24 18:03 fabiospampinato

@fabiospampinato Yeah, IMO the best possible outcome here would be:

  • we come up with a semantics for all operations that makes it impossible to observe whether a dependency is listed more than once
  • we mandate that reading N distinct sources should only take O(N) memory, even if you read them many times
  • implementations get to pick their own heuristic for when to switch from linear scan with duplicates to a deduplicated structure with worse constant factors, or even some weirder deduplication option (like mobx uses).

So the question is how we ensure (1), because as a spec we have to pin down all observable behaviors -- even the observable behaviors that convey or depend upon "useless" information.

@littledan I don't really know what to do about .sets in computeds, TBH. In a ranking of behavioral simplicity I'd probably order the options as...

  • Just ban .set in computeds.
  • Allow .set, but only in "outermost computeds", those that aren't consumed by other computeds. (This seemed cleaner in a world where we had Effects rather than Watchers. Getting to choose up front whether your derivation can set states or can have a value which other things read, but not both, is a lot more intelligible than having your set-permission determined by your position in the graph.)
  • Allow .set, but throw if it ever happens during the "check dependencies" phase. (Worse than the previous -- "is this computed allowed to set state" isn't even the right question anymore, it depends on when it's running.)
  • Allow .set, and it doesn't throw, but it behave slightly differently to avoid you getting to make a mess if you try to do it in the wrong situations. (This seems just too surprising to stomach... I hope.)

I'm not completely clear on what level of ~~chaos~~ freedom frameworks needed here... and I wonder if there's something like the computed<->effect distinction we could add back in to make a hard line between the "can set" and "can have a value" derivations.

shaylew avatar Apr 13 '24 06:04 shaylew