aria-at Consistency of app feature to merge test run results

Note: This is an app-related issue but I'm filing in the main aria-at repo in order to give it the agendar-related labelling.

The ARIA-AT app has a feature to merge results from previous test runs when a new version of a test plan is added to the queue. This allows previously collected data that is still applicable to be carried forward when a test plan is modified, rather than that data being lost.

There are some open questions about the consistent reliability of this feature that should be discussed in a Community Group meeting. For instance:

Is the feature designed to retain assignees in the test queue, or only test results and corresponding conflicts?
Between @mcking65, @IsaDC and myself, we've noticed instances of:
- The feature not working at all (i.e. results being completely lost despite there not being any changes between test plan versions);
- results being retained from an old, outdated test run rather than the most recent; and
- the feature fully working as expected.

Mar 18 '25 22:03 jscholes

Is the feature designed to retain assignees in the test queue

In short, yes.

If the results of each copied test, in a run which are directly tied to an assignee, for a new test plan version being copied from an older test plan version run's results haven't undergone any significant changes, then each passing individual test in the newly copied run will be considered completed at the time the copy was done (the equivalent of pressing the Submit Results button on a run's test).

The only tracked significant changes right now are:

a command being added
a command or setting being changed
an assertion being added
any assertionId or testId from the csvs being updated

If any of the above criteria is met, the test is considered not completed and should be treated as the assignee needing to update and submit whatever content is now missing.

But even if they are all completed, it doesn't mean an automatic marked as final happens as the expectation is test admins should still ensure that the copied over data is still relevant and doesn't need to be updated again before marking as final. This is according to notes currently around that block of code. Is that not the expectation any longer?

or only test results and corresponding conflicts?

Since the test results are tied to assignees, if the results are being copied from non-finalized reports, currently in the Test Queue and there are conflicts between those assignees, the copied results would come with those conflicts (unless the command/assertion the conflict was happening on was removed between version updates).

... we've noticed instances of:>

The feature not working at all (i.e. results being completely lost despite there not being any changes between test plan versions);

results being retained from an old, outdated test run rather than the most recent; and

the feature fully working as expected.

Looking forward to finding out in which instances, the result copy was unexpected so we can also patch those.

But, for the most part, the above previously mentioned significant changes, any instructions change, references change or settings identifier changes are ignored as these shouldn't prevent the result copy process from losing data. Admittedly, some items may be missed due to a nuance in the data structure or because of a situation that wasn't accounted for before and coming across for the first time (like recently with the reference folder update).

expand for a more technical breakdown

The data structure of a test that is hashed and compared to identify changes is as follows (some content has been renamed or omitted to help):

json sample

{
  "title": "string",
  "testId": "string",
  "rowNumber": "number",
  "renderedUrl": "string", 
  "at": {
    "key": "string",
    "name": "string",
    "settings": {
      "_settingName_": {
        "screenText": "string",
        "instructions": ["string"]
      }
    }
  },
  "commands": [
    {
      "settings": "string",
      "commandIds": ["string"]
    }
  ],
  "assertions": [
    {
      "priority": "string",
      "assertionId": "string",
      "assertionPhrase": "string",
      "assertionStatement": "string",
      "assertionExceptions": [
        {
          "priority": "string",
          "settings": "string",
          "commandId": "string"
        }
      ]
    }
  ],
  "renderableContent": { 
    "info": {
      "title": "string",
      "testId": "string",
      "references": {
        "type": "string",
        "refId": "string",
        "value": "string",
        "linkText": "string"
      },
      "presentationNumber": "number"
    },
    "target": {
      "at": {
        "key": "string",
        "name": "string",
        "settings_string": "string",
        "settings": {
          "_settingName_": {
            "screenText": "string",
            "instructions": ["string"]
          }
        },
        "assertionTokens": {
          "_assertionTokenId_": "string"
        }
      },
      "setupScript": {
        "name": "string",
        "script": "string",
        "source": "string",
        "scriptDescription": "string"
      },
      "referencePage": "string"
    },
    "commands": [
      {
        "settings": "string",
        "commandId": "string",
        "keystroke": "string",
        "keypresses": [
          {
            "keystrokeId": "string",
            "keystroke": "string"
          }
        ],
        "presentationNumber": "number",
        "assertionExceptions": [
          {
            "priority": "number",
            "assertionId": "string"
          }
        ]
      }
    ],
    "assertions": [
      {
        "refIds": "string",
        "priority": "number",
        "assertionId": "string",
        "assertionPhrase": "string",
        "assertionStatement": "string"
      }
    ],
    "instructions": {
      "settings": {
        "_settingName_": ["string"]
      },
      "defaultTaskInstructions": "string"
    }
  }
}

Of the above, the following is ignored during any results copy comparison:

at.settings / renderableContent.target.at.settings - because the settings property's values could be updated at any point with a new setting or updated settings instructions, unneeded in relation to results copy process
renderedUrl - includes commit SHA, unneeded in relation to results copy process
renderableContent.target.referencePage - most recent addition, described why in https://github.com/w3c/aria-at/issues/1205#issuecomment-2731123653: changes to the reference folder updates weren't being ignored and only came up recently but this certainly shouldn't affect the test results copy.
renderableContent.info.references - also shouldn't impact the test results copy. Just means the updated results will have the updated references.
Any instructions-related field - changes if instructions are changed but not relevant to test results copy. The updated results will have the updated instructions.

commands / renderableContent.commands and assertions / renderableContent.assertions are also ignored since those change the most but they are compared in a follow up process on a per test basis to determine what has changed there because as mentioned above, these are tracked significant changes.

This means anything not tracked above will cause the copy results feature to not happen, so:

title-related change
testId-related change
rowNumber-related change
presentationNumber-related change
renderableContent.target.assertionTokens change - but doing this makes me think this should also be ignored
renderableContent.target.setupScript change

Mar 19 '25 21:03 howard-e

The ARIA-AT Community Group just discussed Issue 1211 - Reliability of app support for advancing new test plan versions.

The full IRC log of that discussion

<jugglinmike> Topic: Issue 1211 - Reliability of app support for advancing new test plan versions
<jugglinmike> github: https://github.com/w3c/aria-at/issues/1211
<jugglinmike> Matt_King: we expect results to be copied into the draft for the report of the new test run. We've seen some inconsistent behavior on this, though
<jugglinmike> Matt_King: James filed an issue, and howard-e shared a very detailed response. Have you had a chance to review howard-e's response, James?
<jugglinmike> James: I did read this when it was first posted; I will have to refresh my memory
<jugglinmike> Matt_King: I don't think that we have a current behavior failure
<jugglinmike> Matt_King: We did have an example, but we destroyed it when we fixed the problem
<jugglinmike> Matt_King: We're going to have an opportunity coming up. IsaDC is working on a change to the slider. We'll see if that one works correctly. It might have something to do with which specific things get changed in the test plan. We can just leave this issue open until we see a problematic behavior again
<jugglinmike> James: We're missing the ability to update the reference without changing anything in the test plan itself.
<jugglinmike> James: Some change would warrant changing the reference date. But sometimes we have to make a small change to make settings work. What we don't have in the app is to essentially take notice of that
<jugglinmike> James: from howard-e's response, it seems as though the app is only aware of a command being change, an assertion being changed, or a change to an assertion ID
<jugglinmike> James: ...but we also want the app to take notice if we change the reference or the setup script
<jugglinmike> James: So right now, we've pushed a new test plan, and it doesn't get re-imported
<jugglinmike> Matt_King: That's a different problem, them. This is about copying results
<jugglinmike> Matt_King: If, for example, the assertions change, then you don't copy the results from the prior into the new
<jugglinmike> Matt_King: If the setup script changed, is that another one that should void prior results? What about the reference?
<jugglinmike> James: It's tricky to say because that's on a "test" level
<jugglinmike> Matt_King: Right
<jugglinmike> Matt_King: One of the side-effects of maintaining who the tester is, is that we currently don't have a function for the tester to be changed from one person to another
<jugglinmike> Matt_King: It would be really nice if, when something was assigned to me and I did half the work, if I could re-assign it to Joe_Humbert. Then Joe_Humbert would assume responsibility for everything I've done, and he could finish the rest of the work
<jugglinmike> IsaDC: With the bot, it would be really useful to have that because sometimes we have the bot collect responses, then we assign to a tester, and then that tester can't help, but we aren't able to re-assign the run to another tester
<jugglinmike> Matt_King: That sounds like another feature request
<jugglinmike> Matt_King: A button for "change assignee"
<jugglinmike> Matt_King: We could even make the person's name into that button. Right now, it's a link to their GitHub profile
<jugglinmike> Matt_King: You can propose something
<jugglinmike> Matt_King: Right now, I would prioritize this as "P2"
<jugglinmike> Carmen: Got it!
<jugglinmike> Matt_King: If a copy is in prior results that aren't value, it's up to someone to re-run those results or make sure the previously-copied results are valid
<jugglinmike> Matt_King: Do we want to err on the side of over-copying (copying things that may have been voided), or under-copying?
<jugglinmike> James: I would like to test things like these before they go into the main app
<jugglinmike> James: I think that, regardless of the route we take, it needs to be possible for us to--when we make a change to the test plan, run it through a separate environment which is a copy of production, in order to review the actual change
<jugglinmike> James: Then we can immediately halt and not deploy to production because something unexpected happened
<jugglinmike> Matt_King: Essentially testing the test plan itself
<jugglinmike> IsaDC: Yes!
<jugglinmike> Matt_King: Okay, that is a separate issue. It's on the agenda, though we won't get there today
<jugglinmike> Matt_King: I think it might not be a massive piece of work to make it happen. We'll save the discussion for when we get to that issue
<jugglinmike> Matt_King: But in the mean time, if you can reflect on how safe we want to play it, I think that would be helpful
<jugglinmike> James: I would also love the ability to "roll back" anything that happened. Whether due to a bug or an expected-but-hard-to-predict behavior, I would love to be able to revert
<jugglinmike> IsaDC: I'm pushing some changes, and I would like to know if the results we have now--will we have a way to get them back?
<jugglinmike> James: We're making a change to a test plan, and it's possible that the same issue in the app will occur. Do we have a strategy to address it if we lose the results?
<jugglinmike> Carmen: I can ask howard-e tomorrow
<jugglinmike> Matt_King: Let's do it today and pay attention to what happened. If something goes wrong, we can send howard-e a detailed e-mail with what happened
<jugglinmike> Carmen: directly after this call, I will see if we can do a database dump. I'll reach out to you soon, IsaDC

Apr 09 '25 17:04 css-meeting-bot

aria-at aria-at copied to clipboard

Consistency of app feature to merge test run results

aria-at
aria-at copied to clipboard