tinypilot Fix e2e interferences, and establish robust parallelism

Merging over the grouped e2e tests uncovered an interference problem in our e2e test setup.

Update: we disabled concurrent tests locally to alleviate the problem for local development for the time being. The topic in itself is still valid, however.

Problem

Our Playwright setup is this:

On CI, we only use a single Playwright worker. Therefore, Playwright will run each spec file one after the other, in alphabetic order.
Locally, however, we ~run~ had been running the files in parallel, which is the default behaviour, in case workers wasn’t set (i.e., undefined).

Due to the new grouping and our usage of the beforeEach mechanism, the timing behaviour of the tests happened to change, compared to before:

When the security-dialog tests are running, they alter the server state and toggle on the auth requirement. The about-dialog tests are executed in parallel, and can now randomly fail, since Playwright happens to attempt to open the about dialog during a time when the auth requirement is active (caused by one of the concurrently ran security tests). Therefore, Playwright fails to open the about dialog in the first place, as it’s stuck on the login page.

This problem isn’t new, but so far the timing was in our favour, so we were lucky enough to not run into this. Now, with the different timing behaviour, the e2e tests can fail locally.

Again: due to us only using a single worker on CI, this problem ~only manifests~ was only manifesting itself locally, not on CI.

Solution

As we continue to add more and more e2e tests (which I think is terrific), I suggest we fix our e2e test setup, and make it robust against these kinds of issues. Specifically, I think we should establish a deliberate mechanism for parallelism:

If we perform sequential scenarios that alter server state (such as the security tests), we should make sure that no other test can run in parallel. That is because server state acts like a singleton, and is shared across all tests.
- These kinds of test also must clean up the server state afterwards, otherwise we might see erratic failures in subsequent tests.
For all other, “read-only” tests (i.e., ones that don’t alter server state), we should embrace parallelism as much as possible, to take advantage of lower execution times. I think we should be able to do that on CI and locally likewise, so I’d suspect we don’t need to differentiate here like we do now.

Nov 30 '23 14:11 jotaen4tinypilot

Yeah, this is a tricky one. It looks like Playwright has more granular parallelism controls than we've been using, but it doesn't look like we can say "run everything in parallel, but make sure nothing else is running while this parallelism-unfriendly test is running."

The way I approached this on PicoShare is that each Playwright session has its own, independent in-memory SQLite database. Before each test, Playwright POSTs to a dev-only route to say it wants a per-session database. The server assigns a cookie in response, and then whenever the server receives subsequent requests with that cookie, it matches the request to the corresponding in-memory SQLite database.

That system works okay, but it gets a little hacky when we have to share backend state between two browser sessions.

Nov 30 '23 15:11 mtlynch

Another potential approach that I stumbled across is to divide the tests into different test projects, where each project can have its own configuration.

I’m not sure, however, what the best way is to run these projects independently. E.g., we could use the --project CLI flag, but then we’d have to issue multiple playwright invocations.

I haven’t looked much into that, so maybe this could be a viable approach. There might also be a bunch of other options to research.

Nov 30 '23 15:11 jotaen4tinypilot

tinypilot tinypilot copied to clipboard

Fix e2e interferences, and establish robust parallelism

Problem

Solution

tinypilot
tinypilot copied to clipboard