Asserting screen reader mode changes in ARIA-AT tests
Background
Screen readers offer different modes for reading and interacting with content and controls on a web page. These modes are required to suit the various situations a user may find themselves in, and the actions they must carry out to deal with them. For example:
- When on a text-heavy page, such as a Wikipedia article, users need to read text in sequence using the arrow keys, jump to different headed sections using H/Shift+H, etc. As such, the screen reader's own specific cursor will be active, and many keystrokes will be entirely handled by the screen reader.
- When filling out a form, users need to type text in the various inputs, scroll through the choices in radio groups and select dropdowns, and toggle checkboxes. Accordingly, the screen reader's own cursor will not be active, and many keystrokes that would otherwise be used for controlling that cursor will be passed through to the browser/web page instead. This includes the arrow keys and similar, e.g. to control the position of the system caret within a text field.
- When operating more complex controls, including so-called "composite widgets", keyboard commands are often required to be handled by the web page directly. For example, the arrow keys are used to move within grids, tab lists and toolbars. Similar to the case of filling out a form, this requires a screen reader's own cursor to be inactive, with the appropriate reduction in keystroke handling.
As these affordances are shared across screen readers, but with slightly different names and connotations, the ARIA-AT project has chosen to adopt an abstracted set of terms to describe them. Please see the Screen Reader Terminology Translation page on the wiki for more details.
Automatic Mode Switching
As web pages continue to increase in complexity, frequently changing the active mode is a requirement. To save users the overhead of having to carry this out manually, screen readers implement patterns of automatic mode switching to respond to common scenarios (e.g. focus moving into a text field).
It is of critical importance that these mode switches be applicable and consistent, to create a predictable experience and avoid users having to second-guess their screen reader's behaviour. However, the majority of ARIA-AT tests do not currently include assertions targeting this aspect of functionality; the sole exception is the Menubar Editor plan, which includes assertions following the wording:
Change of mode from reading to interaction is conveyed
This is a situation we would like to change, but there are a number of questions/concerns to be addressed before doing so.
Outstanding Questions/Concerns
Note: these are what we've come up with at PAC, there may be more.
Sound-Based Feedback
With JAWS and NVDA configured to use default settings, mode changes are only conveyed via sound. That is, the speech output does not reflect whether or not the mode has switched.
This may make it more difficult for testers to unambiguously determine that a mode switch has occurred, and/or which one, particularly if a screen reader switches modes multiple times in rapid succession. In such a case, a sound may interrupt or overlap with the playback of another, and testers would need to take additional steps to verify the new state. It also prevents deaf-blind braille users from running tests, which they are currently able to do.
Meanwhile, the lack of speech output is a problem while reviewing conflicts within results. When in doubt about what a particular tester experienced, the test admin should be able to use the recorded output as a canonical source of truth, and map it to all of the assertions accordingly. If the output does not include a significant area of feedback used by the tester, this will not be possible to the same degree.
Indefinite Assertion Wording
Related to the lack of speech output, the assertion text may not be immediately understandable. We may just want to stick with what exists for the menubar tests, but possibly link to the wiki page somewhere for testers to reference.
Mode Applicability Inconsistencies
On the previously linked Screen Reader Terminology Translation wiki page, the "Desktop Screen Reader Terms" section states that:
... MacOS VoiceOver technically does not have a mode equivalent to JAWS virtual cursor mode. However, behavior of VoiceOver is sufficiently similar when quick nav keys are toggled on for the ARIA-AT project to treat the quick nav on state of VoiceOver as equivalent to JAWS virtual cursor mode.
The table in the same section creates equivalencies between Reading Mode and Quick Nav being enabled, and Interaction Mode with Quick Nav being disabled. But none of the tests reflect this, including the aforementioned Menubar plan. Testers are simply instructed to turn Quick Nav off, always, and VoiceOver is essentially treated as "modeless".
Note: I believe VO also uses sounds by default, to indicate when Quick Nav is on or off. Needs to be verified.
Automation Approach
Once test running is automated, it isn't currently planned to capture non-speech-based feedback from a screen reader, so a solution will eventually need to be found to the lack of it. It may benefit everyone to find that solution now and apply it to human testing.
Potential Solutions
(not necessarily mutually exclusive)
- Instruct testers to change the relevant screen reader settings to use speech feedback instead of sounds. Not sure if this is doable in JAWS with the default mode active. Also don't know if this can be changed independently in VO, or whether the tester would have to turn off all sound-based feedback.
- Treat VoiceOver as a modal screen reader, i.e. include it within Reading and Interaction Mode tests as is the case for NVDA and JAWS, and ensure that all tests are carried out with Quick Nav on and off.
- Provide a means for a tester to indicate that they heard the Reading and/or Interaction Mode sounds, e.g. with checkboxes and buttons to hear what they sound like in the app. Note that this is exclusionary for certain audiences, as already described, and would not resolve it for automation purposes.
- Have the automation spec include capture of played sound files, and/or special handling of screen reader internal state.
Decisions from August 4 CG meeting:
- Provide instructions to testers about how to configure their Windows screen reader to announce mode changes, instead of playing a sound.
- Update plans within the 16 targeted for Candidate phase to add mode switching assertions, and include these assertions in new test plans going forward.
- Within the 16 test plans targeted for Candidate phase, prioritise readiness for ones that do not need these new assertions and instructions, to provide sufficient time to uncover and address any unforeseen issues.
- Think through the addition of a new undesirable behaviour within the ARIA-AT App, something like: "Unexpected mode switch occurred". Do we need two of these, one for each mode?
This issue is now resolved with the v2 format of the test format.