digitaljs icon indicating copy to clipboard operation
digitaljs copied to clipboard

Add navigation history to subcircuit click event

Open malmeloo opened this issue 2 years ago • 7 comments

Adds a "navigation history" parameter to the subcircuit button click event introduced in #100. This is essentially an array of models that were visited to end up on the subcircuit that contained the button the user clicked on.

malmeloo avatar Oct 08 '23 12:10 malmeloo

Any clue why that test might be failing? It seems unrelated to me, but I'm not sure.

malmeloo avatar Oct 08 '23 13:10 malmeloo

Any clue why that test might be failing? It seems unrelated to me, but I'm not sure.

Unfortunately, I don't know. There seems to be some nondeterminism in the Worker engine which leads to some tests randomly (and very rarely) failing. Somehow, the test seems to think that the simulation didn't stop running. I looked at the code a few times now and can't really find the root cause (actually, couldn't really reproduce the issue on my machine). If you can find and fix the issue, it would be appreciated.

tilk avatar Oct 09 '23 09:10 tilk

Also, I'm not sure I like this change. Maybe it would be cleaner, and ultimately more useful, if models remembered their parent model. This way, you could reconstruct the "navigation history" when needed.

tilk avatar Oct 09 '23 09:10 tilk

Maybe it would be cleaner, and ultimately more useful, if models remembered their parent model.

You're right, I like that better as well; I'll look into it later this week. Who knows, maybe that'll also fix the strange testing issue.

malmeloo avatar Oct 09 '23 09:10 malmeloo

maybe that'll also fix the strange testing issue.

Wouldn't count on it. The issue unfortunately exists on the master branch, and it is nondeterministic - re-running the test sometimes shows no errors.

tilk avatar Oct 11 '23 14:10 tilk

I've changed it so that models now have a "parent" attribute which can be queried. Since top-level models are a bit different, there's also an "isTopLevel" attribute. It's currently essentially the same as checking whether model.get("parent") === null, but I personally prefer this since it's a little more verbose. Let me know what you think!

malmeloo avatar Oct 14 '23 13:10 malmeloo

Any clue why that test might be failing? It seems unrelated to me, but I'm not sure.

Unfortunately, I don't know. There seems to be some nondeterminism in the Worker engine which leads to some tests randomly (and very rarely) failing. Somehow, the test seems to think that the simulation didn't stop running. I looked at the code a few times now and can't really find the root cause (actually, couldn't really reproduce the issue on my machine). If you can find and fix the issue, it would be appreciated.

I still haven't found a definitive root cause for this issue, however I do have a hypothesis. I cannot reproduce it locally either, but after increasing the default timeout in waitUntilStable in the tests (https://github.com/DismissedGuy/digitaljs/commit/135af5f0c6feeb93f9213a61deb753b4b88354c2), they appear to execute much more consistently on GitHub Actions, at least compared to what I've seen in this repository.

If I understand the code correctly, internally the following is happening:

  1. updateGates() is called on the engine and posts a message to the worker
    1. The worker receives said message and does its thing
    2. The worker sends a type: 'update' message back to the worker
    3. The worker acks the incoming message
  2. The updateGates() promise resolves upon receiving the ack
  3. The test calls the hasPendingEvents getter on the engine

I believe that the problem here is caused by how hasPendingEvents works: it doesn't actively request the current state from the worker, but rather relies on its internal cache for this attribute, which appears to only be updated when it receives an update message from the worker. You would think that this is fine because the worker sends an update before the promise is resolved. However, while I cannot find an official source for this, it appears that postMessage is heavily subject to the OS's thread scheduling and that the order in which messages are received cannot always be guaranteed: https://stackoverflow.com/questions/7536901/does-window-postmessage-guarantee-the-order-of-events. Hypothetically, it would be possible that an update message is delayed by so much that the ack has already been received, meaning that if the timeout is reached in the test, it will continue and verify against the (now incorrect) value of hasPendingEvents. Especially since updateGates() and hasPendingEvents are called in a very tight loop in a lot of tests.

Again, this is only a hypothesis, but it would explain why this only happens for the worker engine, why increasing the timeout enough resolves the issue (although it's really not a proper fix), why I personally cannot reproduce this on my slower laptop and why it appears to be so nondeterministic. I'm not sure if I can find the time to actually test and implement a fix for this, but hopefully it helps.

malmeloo avatar Oct 29 '23 14:10 malmeloo