synthesis icon indicating copy to clipboard operation
synthesis copied to clipboard

Fix: Robot Loading Unit Test Fail `[AARD-2003]`

Open AlexD717 opened this issue 5 months ago • 18 comments

Task

Fix a Unit Tests that was failing for previously unknown reasons.

AARD-2003

Symptom

When running unit tests for the first time (robots not cached) this error occurs Screenshot 2025-07-17 at 10 13 02 AM

Note: This error only occurs in the Firefox specific environment, and rerunning the tests causes it to pass.

Solution

The error was caused by a race condition where multiple test files where trying to fetch the same robot file. To fix this support for multiple requests was added to MirabufLoader. Also, if a robot/field fails to cache it still adds the robot/field file to the local memory, allowing the user to spawn and use the robot (however refreshing the browser would cause this robot to disappear).

Verification

  • Unit tests pass

Before merging, ensure the following criteria are met:

  • [ ] All acceptance criteria outlined in the ticket are met.
  • [ ] Necessary test cases have been added and updated.
  • [ ] A feature toggle or safe disable path has been added (if applicable).
  • [ ] User-facing polish:
    • Ask: "Is this ready-looking?"
  • [ ] Cross-linking between Jira and GitHub:
    • PR links to the relevant Jira issue.
    • Jira ticket has a comment referencing this PR.

AlexD717 avatar Jul 17 '25 17:07 AlexD717

Note: Testing Synthesis in Firefox shows that multiple console warning appear, but the program works overall. Screenshot 2025-07-17 at 11 00 52 AM (These errors appear both in the dev and production version) We might want to consider looking further into if we want to support Firefox at all and other errors that can appear there.

AlexD717 avatar Jul 17 '25 18:07 AlexD717

Given that the tests are passing again on ~~dev~~ other branches, the issue isn't fully reproducible. How do you know that this is the cause of the issue?

rutmanz avatar Jul 17 '25 18:07 rutmanz

I was receiving the same errors when running unit tests for the Mirabuf Testing ticket. If you run the tests a second time, weirdly the errors do not show up anymore. I second the question on root cause and reproducibility.

ryanzhangofficial avatar Jul 17 '25 18:07 ryanzhangofficial

After implementing all of these changes, I have rerun the tests multiple times and they have passed every time. I don't really know any better way to verify that this works.

AlexD717 avatar Jul 17 '25 21:07 AlexD717

Given that the tests are passing again on ~dev~ other branches, the issue isn't fully reproducible. How do you know that this is the cause of the issue?

@rutmanz They only pass after re running the workflow. This is a bit of a weird issue. If the action passes after the first run on a fresh commit then this fix works.

BrandonPacewic avatar Jul 17 '25 21:07 BrandonPacewic

@BrandonPacewic No state is shared between workflow attempts (or no more than is shared between first attempts of consecutive commits) as far as I'm aware

rutmanz avatar Jul 18 '25 15:07 rutmanz

No state is shared between workflow attempts (or no more than is shared between first attempts of consecutive commits) as far as I'm aware

I might be wrong, but I think that the robot/field elements might be getting cached and persist across test runs which results in the second rerun succeeding.

AlexD717 avatar Jul 18 '25 17:07 AlexD717

Is this PR stale after #1225

Dhruv-0-Arora avatar Jul 18 '25 20:07 Dhruv-0-Arora

Is this PR stale after #1225

I don't think so? It solves a race condition when multiple tests try to access the same asset, which would still be present regardless of where it's loading from. It's possible that the issue will be less common now that we're not downloading the assets from such a slow server (and thus importing and caching will take less time) but I imagine the issue would still be there.

PepperLola avatar Jul 18 '25 20:07 PepperLola

I guess because of the reproducibility issues, I'm concerned that this isn't a/the problem. Are you able to induce the issue that this solved with your own script that loads them in parallel?

rutmanz avatar Jul 21 '25 21:07 rutmanz

I guess because of the reproducibility issues, I'm concerned that this isn't a/the problem. Are you able to induce the issue that this solved with your own script that loads them in parallel?

I did verify that spawning robots in parallel after my changes works, and I have rerun the tests multiple times with the tests passing each time.

AlexD717 avatar Jul 21 '25 21:07 AlexD717

I did verify that spawning robots in parallel after my changes works, and I have rerun the tests multiple times with the tests passing each time.

@AlexD717 Are you able to verify that spawning robots in parallel without your changes doesn't work? (i.e. it isn't something else that was causing the issue)

rutmanz avatar Jul 21 '25 21:07 rutmanz

I did verify that spawning robots in parallel after my changes works, and I have rerun the tests multiple times with the tests passing each time.

@AlexD717 Are you able to verify that spawning robots in parallel without your changes doesn't work? (i.e. it isn't something else that was causing the issue)

I don't think I can double check right now as the server to download robot files is currently down. However, unit tests were failing, I implemented these changes, unit tests started passing. This was before I merged in dev to use git lfs instead of downloading the files.

AlexD717 avatar Jul 21 '25 21:07 AlexD717

But unit tests started passing everywhere at around the same time and this doesn't seem like the obvious cause of/solution to that issue to me. You should still be able to induce the race condition (if that's what actually caused that test to fail) by mocking fetch and making requests take an extra 3s (or whatever is required), then loading two files simultaneously in a throwaway test.ts file

rutmanz avatar Jul 21 '25 21:07 rutmanz

But unit tests started passing everywhere at around the same time and this doesn't seem like the obvious cause of/solution to that issue to me. You should still be able to induce the race condition (if that's what actually caused that test to fail) by mocking fetch and making requests take an extra 3s (or whatever is required), then loading two files simultaneously in a throwaway test.ts file

The unit tests started passing because we started re-running them. Also after you updated it to use git lfs all unit tests started passing on the first attempt. I made sure that the unit tests passed on the first run and on all subsequent re-runs in this branch. I also only merged in dev with the git lfs change after I got the unit tests to properly run on this branch.

I can look into making a throwaway test.ts file like you suggested, but I am unsure how useful that would be because of the things I mentioned above and since that would require me to run the tests in the online GitHub environment since that was the only place they were happening.

AlexD717 avatar Jul 21 '25 22:07 AlexD717

[!Note] The bug this PR was supposed to fix was resolved by #1225 . Since we are not sure of the purpose/need of this PR, we will revisit it at the end of the summer or if the issue resurfaces.

Dhruv-0-Arora avatar Jul 22 '25 00:07 Dhruv-0-Arora

Note

The bug this PR was supposed to fix was resolved by #1225 . Since we are not sure of the purpose/need of this PR, we will revisit it at the end of the summer or if the issue resurfaces.

Is this PR still needed?

ryanzhangofficial avatar Aug 21 '25 16:08 ryanzhangofficial

This is something that I think we should spend some time finishing in the future. Even though its not currently causing any issues these changes, functionality wise, make sense from a technical design perspective. Since its not user facing, it will not be part of the v7.2 release.

BrandonPacewic avatar Aug 29 '25 06:08 BrandonPacewic