aria-at Non-production testing for test plan changes

From time to time, we need to test certain aspects of test plans that are still under development within the ARIA-AT app. However, we can currently only cause the production app to import a new test plan version by merging it into the master branch of this repo. This raises several concerns such as:

Testing in production, rather than e.g. in staging, sandbox or another environment for that purpose.
Needing to merge a test plan into master before we may feel it's ready.
Being unable to roll back test plan phase changes
... etc.

We want to discuss methods for improving this situation, e.g.: A dedicated branch for testing test plans in non-production app environments, the ability for a non-production app to import a test plan from a specific PR and/or zip file, and rolling back test plan phase changes on production that were carried out accidentally or otherwise in a way that was determined to be incorrect/unhelpful/etc.

CC @IsaDC, @mcking65

Mar 18 '25 22:03 jscholes

We want to discuss methods for improving this situation, e.g.: A dedicated branch for testing test plans in non-production app environments, the ability for a non-production app to import a test plan from a specific PR and/or zip file, and rolling back test plan phase changes on production that were carried out accidentally or otherwise in a way that was determined to be incorrect/unhelpful/etc.

+1 on all of these suggestions.

Mar 20 '25 19:03 howard-e

The ARIA-AT Community Group just discussed Issue 1212 - Sandbox testing of test plan changes.

The full IRC log of that discussion

<jugglinmike> Topic: Issue 1212 - Sandbox testing of test plan changes
<jugglinmike> github: https://github.com/w3c/aria-at/issues/1212
<jugglinmike> james: This came about just because we've had a few instances where we updated a test plan and wanted to see how it manifested in the app. Or where results have unexpectedly not been carried over
<jugglinmike> james: The individual things which prompted it are worth discussing on their own, but essentially, we think it would be helpful to preview test plans in a version of the app that is authentic, without requiring merging a pull request
<jugglinmike> james: We currently don't have many roll-back opportunities after merging
<jugglinmike> james: for example, we have the "staging" environment and the "sandbox" environment, but they don't really reflect "production" to the degree that we can use them to reason about things. They lack the wealth of data in the "production" environment
<jugglinmike> Matt_King: If we had the ability to get a most-recent test plan data into the staging environment, but to operate off of a branch...
<jugglinmike> Matt_King: I guess the main thing here is: once a test plan is merged into the "master" branch, the only way to correct problems is to merge a new version into the "master" branch
<jugglinmike> Matt_King: The problem is that we don't have an environment that has all the prior results for the test plan, so we can predict what merging will really do.
<jugglinmike> james: yes
<jugglinmike> Matt_King: So this is really a data thing
<jugglinmike> ChrisCuellar: What was the original intention behind the "staging" environment versus the "staging" environment
<jugglinmike> Matt_King: The "sandbox" environment was for developers to push whenever they like. Like a "nightly" build
<jugglinmike> Matt_King: The "staging" environment was meant to give external stakeholders an opportunity to view something new while still using "sandbox" for ongoing work
<jugglinmike> Matt_King: We should be able to experiment with the data in the sandbox to any degree
<jugglinmike> Matt_King: I kind of wonder if maybe the data in staging should be closest to production
<jugglinmike> Matt_King: But james is saying that we don't have an environment that matches production and which is safe to mess around with
<jugglinmike> james: We also recognize that "staging" may have features which is not in production
<jugglinmike> james: I think the ideal would be the ability to have a copy of "production" on-demand. And to have that copy be able to read from GitHub
<jugglinmike> Matt_King: I want to be careful to not build a massive new thing in order to solve an occasional problem
<jugglinmike> Matt_King: I think there may be a way to use "staging" to solve this problem
<jugglinmike> Matt_King: It could be (especially given the way we've been working lately) that we could do something so that the staging environment is essentially equivalent to production, except for the possibility that we push something new to it. But we have the ability to go to the staging environment and say "at this point in time, run this script to set the data in the staging environment to be equivalent to production"
<jugglinmike> Matt_King: ...and if we also had the ability to pull in test plans from a branch other than "master", and for you to choose which branch that is
<jugglinmike> Matt_King: In staging, we could have a feature that says, "from which branch do you want to pull?" So if you have a pull request branch, you could pull from that branch and then work with it in staging, and it would be just like working with it in production
<jugglinmike> howard-e: It won't be exactly the same, though. There will be times where staging will have updates that go beyond production
<jugglinmike> Matt_King: yes, but I think 90% (or even 99%) of the time, those changes are not related to the kind of functionality which affects how the test plans are going to be processed in the test queue (and things like that)
<jugglinmike> Matt_King: We don't often touch the code that affects things like the code which controls how test results are copied
<jugglinmike> Matt_King: It feels like the velocity of change in staging is quite manageable
<jugglinmike> howard-e: Good point
<jugglinmike> howard-e: To re-share what I stated in the issue, I agree with all of this
<jugglinmike> Matt_King: I don't want to let the perfect be the enemy of the good .Even if we could get a "90%" solution...
<jugglinmike> james: I'd love to understand how often we need to share something on "staging" with an external entity. It doesn't seem like it happens very often at all
<jugglinmike> james: If this could be put in place, I would encourage us to always put changes through this flow in order to be diligent about testing our changes
<jugglinmike> Matt_King: We currently have the "preview" capability, and our process involves using the "preview" capability before we merge
<jugglinmike> james: Right, and I don't want to suggest that we should start looking at every single test in the app. I think the preview is good for reviewing the underlying test plan itself
<jugglinmike> james: Most of the time, I think this kind of review should only take a few minutes at most
<jugglinmike> Matt_King: We could have staging serve two purposes. We can have it serve the purpose you're describing, james. We can also have it serve the purpose of staging new changes to the app, but only when the process is such that the development team needs it for that purpose
<jugglinmike> Matt_King: In other words, we could go directly from sandbox to staging to production on a super-fast path almost all the time
<jugglinmike> Matt_King: Right now, we have the feature related to automated updating. When IsaDC and I were giving feedback, we did it in sandbox. When it's time to deploy that feature, do we always go through "staging", or do we go direct from "sandbox" to "production"?
<jugglinmike> howard-e: We always go through "staging"
<jugglinmike> howard-e: It generally takes a week to move from staging to production. It's a manual process that we run internally
<jugglinmike> Matt_King: I wonder if that adds much risk to the kind of previews that james is describing. If you make a change to the test plan, and it looks good in staging, so you merge it, and it goes to production. If something isn't quite right, it could take a week to resolve
<jugglinmike> james: I think it adds quite a bit of cognitive overhead to need to have a sense of "what state is staging in?"
<jugglinmike> james: Also ambiguity. It somewhat undermines what I was trying to achieve when I raised this issue
<jugglinmike> Matt_King: I don't know how to do this without a whole new environment
<jugglinmike> james: Couldn't we make that environment much more ephemeral? Couldn't it happen in GitHub Actions? We only need it for a short time, and then we can throw it away
<jugglinmike> Matt_King: I don't know
<jugglinmike> james: Or, how easy is it to get the app up and running locally? If everything is Dockerized, and all someone has to do us run "docker-compose up", then these concerns go away
<jugglinmike> howard-e: It is not Docker-ized. It could be. While the operating instructions are minimal, it may be preferable to Dockerize
<jugglinmike> ChrisCuellar: I wonder if the Bocoup team can take this internally for discussion. It sounds like there are a lot of options, and I wonder if it would be helpful for us to consider it as a team and come back to you all with some recommendations
<jugglinmike> Matt_King: Yeah, why not?
<jugglinmike> Matt_King: I was actually a little surprised that building locally which PAC might prefer
<jugglinmike> Matt_King: Maybe making it possible for anybody to do that more readily might be good for the project overall in ways that I don't foresee
<jugglinmike> james: Dockerizing is something we do with other clients quite regularly
<jugglinmike> james: We'd have to figure out a way to share the latest SQL dump
<jugglinmike> Matt_King: I'm not familiar with the dockerizing process, but I kind of wonder if others (e.g. Vispero) might benefit from that capability
<jugglinmike> Matt_King: Let's assign this issue to someone at Bocoup and remove the "agenda" label. When you are ready to discuss again, please add the "agenda" label back on

Apr 23 '25 17:04 css-meeting-bot

aria-at aria-at copied to clipboard

Non-production testing for test plan changes

aria-at
aria-at copied to clipboard