xdg-desktop-portal
xdg-desktop-portal copied to clipboard
Accessibility Shortcuts Portal Proposal (`org.freedesktop.portal.AT.Shortcuts`)
The Problem
Accessibility is broken in a big way on Wayland. This is because intercepting and re-transmitting global keybindings is no longer permitted, and fair enough! What a security nightmare! In order to bring security to "normal" applications, the Global Shortcuts portal was proposed, adopted, and even implemented by KDE (so I hear).
Assistive technology, however, is traditionally considered a component with "exceptions" to the rules: "yeah sure, just snoop the keyboard", "it's inherently insecure so who cares". This is because in the most obvious case (a screen reader), it actually has access to what is on your screen anyway (any text, of any document, webpage, terminal output, etc.) and therefore often seems insecure be definition.
In an effort to integrate accessibility into Linux in a way that does not inherently require insecurity, just permission, we've turned to portals.
The Solution
The most viable path for permissions-based accessibility in Linux is to model it after other systems which have already done the hard work of finding the (mostly) right abstractions. In this case, I'm going to recommend we emulate the behaviour of Android, since it's: a) already Linux-based, b) has an accessibility permissions system, and c) sandboxes most applications from the operating system—which seems to be the direction of Linux as well; the only difference for Linux is that we'll have both native applications and sandboxed applications to deal with.
After reaching out for advice over in the GNOME accessibility Matrix room, and some chats over on a wayland-protocols issue, there seems to be quite the consensus on implementing the global shortcuts portal for assistive technologies to be able to do their job. However, the existing global shortcuts portal does not have quite all the features/permission granularity requirements to use an assistive technology at this point.
The Requirements
There are a few requirements for an accessibility portal:
- The user must be asked once whether the application should be allowed as an assistive technology. The "never", "just this time", and "always" options seem appropriate here (remember what Android displays when asking for your location).
- After giving permission, the assistive technology should be able to have a similar API to the global shortcuts API, but with some more stringent requirements:
- It must be allowed to bind and unbind actions at will.
- It must be able to bind to any input event, even ones which would not be "normally" expected. (i.e.,
Insert + a,Capslock + x, orhall on its own). - Since a screen reader often adds and removes the same set of events often, and quickly, it may be worth allowing the assistive technology to define a map to a list of shortcuts (
{sa(sa{sv}}), then allow the set of bound shortcuts to be added to or removed in bulk by using the string key of that map. These changes often happen multiple times per second, and are somewhat time sensitive, since if a user presses a key combination likeCapslock + a(toggle browse mode), immediately followed byh(a key used in browse mode), a screen reader user would expect thathis already bound, even if it was not before theCapslock + a.
- It must have a way for an implementer to not have to show the prompt to a user at all and simply accept it outright as a default assistive technology. If a screen reader can not start properly, bind the keys required, have access to the contents on the screen, etc. then, someone who needs a screen reader will not be able to select an option without sighted assistance (this is very, very bad).
- All events bound via this portal must have priority above the global shortcuts portal. A currently running application should never have the ability to have a shortcut defined that an assistive technology wants. This is also something which can cause the need for slighted assistance (again, very bad).
I'm looking for comments, implementation concerns, links to related issues, requirements and edge cases I have not yet covered, and to gauge general interest in this proposal. Once I've chatted with a few of you here on the issue page, or by email ([email protected]) if you don't have a GH account, I'll begin a portal draft, go through RFC, and see if we can get this hammered into a standard. I will help with the implementation of the portal.
(I am being payed to work on this process, including implementation; responses will be fast during UTC-6 working hours.)
As an aside, I expect many more accessibility features will want a portal implementation at some point. For example, if an application already has the permission to be an assistive technology, it may want to zoom in, or move a user's mouse around (not read the position, I'm aware of the mouse position idea). Both things that are traditionally at least able to be triggered by the screen reader, even if the screen reader itself does not provide the functionality.
And, when talking about a screen reader portal, we might allow the screen reader to request a mouse event for a window at a particular window coordinates of that window, you unfortunately need that operation for some broken websites.
And, when talking about a screen reader portal, we might allow the screen reader to request a mouse event for a window at a particular window coordinates of that window, you unfortunately need that operation for some broken websites.
Mouse events at particular coordinates are best handled by AT-SPI, not portals.
The user must be asked once whether the application should be allowed as an assistive technology. The "never", "just this time", and "always" options seem appropriate here (remember what Android displays when asking for your location).
A counter proposal for this, that makes it harder for applications to "sneakily" make themself accessibility technology by applying bad design: instead of "requesting accessibility", have the accessibility utility via the desktop file advertise that they implement a certain aspect of accessibility. With that, require Settings (or the equivalent in e.g. KDE) to make these discoverable and possible to explicitly enable.
A accessibility application would instead of nagging the user with "please let me eaves drop on you" instead issue some request via the portal to open the accessibility setting, instructing the user to actively select $APP as the "screen reader" for example.
This is not new, it is how some permissions are dealt with in Android.
it may want to zoom in
Zooming in can only practically be implemented by the compositor itself. In GNOME there is possible via the a11y menu, but could perhaps be something that can be triggered via a potential accessibility portal API.
A accessibility application would instead of nagging the user with "please let me eaves drop on you" instead issue some request via the portal to open the accessibility setting, instructing the user to actively select $APP as the "screen reader" for example.
This shouldn't need to make a screen reader your default to have it usable. I don't mind a combination of these features where:
- The default screen reader does not need to ask permission explicitly (i.e., the implementer can say "you are my default screen reader, go ahead").
- Any other assistive technology that would like to launch would be able to ask explicitly.
Zooming in can only practically be implemented by the compositor itself. In GNOME there is possible via the a11y menu, but could perhaps be something that can be triggered via a potential accessibility portal API.
Fair. And I brought it up, but let's leave this for another portal.
This shouldn't need to make a screen reader your default to have it usable. I don't mind a combination of these features where:
Correct me if I'm wrong, but I assume one would only have one screen reader active at any point in time, as otherwise I imagine it'd be quite a noisy experience. So if you have a default bundled screen reader, it'd already be selected, thus wouldn't need to do anything to work. The only time you need to change that setting is if you have another screen reader you want to change to.
If you have non-screen-reader like assistive technology that needs to have the same level of "eaves-dropping-ness", then that would need to go into that Settings panel, but installing such a thing would mean it would be discoverable, and possible to enable.
There is a general problem with "nag" like permissions, which is that the user sooner or later "gives up" or doesn't care if there is a yes/no permission. In other words, they don't really help. Portals are, when possible, designed to avoid this issue. For access to files, a file selector is always presented, for sharing the screen, one have to select the screen to share, and so on.
For something as problematic as getting an introspected view of everything going on on the computer we should try hard to avoid ending up with a Yes/No like dialog, and taking inspiration from Android, making it explicit configuration one is asked to do seems like a good way to mitigate things.
There is a general problem with "nag" like permissions, which is that the user sooner or later "gives up" or doesn't care if there is a yes/no permission. In other words, they don't really help. Portals are, when possible, designed to avoid this issue. For access to files, a file selector is always presented, for sharing the screen, one have to select the screen to share, and so on.
Agreed..... but this makes my situation (where I switch back and forth between two screen readers for testing) an absolute nightmare. I agree from a normal users' perspective this makes sense, though.
EDIT: As long as it is possible for a distribution to ship with a default screen reader, then automatically run that without user interaction, that's fine by me. I'm sure as a dev, I can find a script that'll just swap this setting for me.
Agreed..... but this makes my situation (where I switch back and forth between two screen readers for testing) an absolute nightmare. I agree from a normal users' perspective this makes sense, though.
A rather peculiar use case :) but I imagine this could be scriptable in one way or the other, in most DE:s, e.g. a gsetting in gnome.
EDIT: As long as it is possible for a distribution to ship with a default screen reader, then automatically run that without user interaction, that's fine by me. I'm sure as a dev, I can find a script that'll just swap this setting for me.
This is critical yes, but doable with the solution I'm suggesting, I believe.
makes it harder for applications to "sneakily" make themself accessibility technology by applying bad design
I think this is really important. Android's accessibility portal-equivalent literally does get used by malware (for example), precisely because the interface is so powerful.
The first question is: Are being aware of input events and emitting input events (shortcuts) valid actions to say for sure that an application has accessibility features? If not (which is the case here), then an app certainly can't request accessibility access (whether through a dialog or by opening the accessibility settings). This would be a lie.
(if relevant) The second question is: Is it okay to let an app potentially control the system or other apps, even if asked? Example: accessibility can send auto-generated shortcuts within apps. I don't think so.
This is an attempt to summarize a rough proposal that was discussed on the GNOME a11y Matrix room yesterday:
Assistive Technology
Assistive Technology (AT) are rather special when it comes to the type and breadth of access they need to users system. They need to be able to read out loud what widget is focused in what window is focused, and what letter is entered in what text field. From a privacy and sandbox perspective, the needs of AT are very problematic, as they for all practical purposes need to perform total surveillance of everything user "sees" and does. It would be disastrous if a rogue application would get the same level of access that an AT gets, but at the same time, people may want to install additional or replace existing AT to help them with using the computer.
So, in one way or the other, if we want AT to be distributable in a safe and relatively sandbox friendly way (e.g. Flatpak), we need a portal that can handle access to the resources the system has to make available for the AT to work. At the same time, we need to be very careful in exactly how a user can use install and use AT, without accidentally enabling malware to get the same level of access to resources regular applications shouldn't have access to. At the same time, it needs to be easy enough and discoverable how to e.g. switch to another screen reader or adding additional AT.
Access types
Initially, two types of access types have high priority, and are critical to AT, that are focused upon first.
Priority keyboard shortcuts
Previously, in X11, this has been implemented by grabbing key events from the X server, but doing so is problematic, and seen as something very undesirable in Wayland compositors, as having to round trip for key events to one or more ATs is very problematic.
Instead, a solution to this is to provide something similar to org.freedesktop.portal.GlobalShortcuts, with the difference being that the AT freely can register keyboard shortcuts that unconditionally and immediately is respected by the display server.
As with the global shortcuts portal, the display server would translate a stream of key events into triggered shortcuts, that the AT would then be signalled about.
For this the shortcuts xdg-spec might need to be expanded to handle the use cases needed by ATs.
This would avoid any display server AT round trips, but still allow shortcuts for ATs to have priority over other shortcuts on the system.
Access to the accessibility bus
The accessibility bus is a dedicated D-Bus bus where applications describe what the user is currently interacting with. Access to this is the most problematic, as it allows the application to fully introspect what is going on on the computer, including reading any text, reading everything the user types, etc.
I'll leave the details of how to practically open such a bus, but in one way or the other, e.g. by opening file descriptors, it could be done with API on an accessibility portal.
Handling access granting requirements
As mentioned, we must try hard avoid rouge application that want to trick users into letting them spy on them, but we also need to make it possible to let distributions pre-install a screen reader that should have access without needing to query the user, as without said screen reader, the user wouldn't be able to interact with the computer at all. What this means in practice are these things:
- We cannot grant access by default to any installed, as that would mean any application can freely spy on the user as much as they want.
- We should avoid designing a portal that depends on portal dialogs asking the user to grant access. The reason for this is that users that get nagged with dialogs practically asking them to "make things work" will eventually give up and just accept what the application is asking.
- We must make it possible for distributions to pre-install an AT application, that has been granted enough access to e.g. act as a screen reader.
- It'd be good if e.g. an AT settings app could still "guide" and help the user how to grant access.
Access handling proposal
Make giving access to an AT an explicit operation similar to other system configuration, and not something directly requested by the AT application itself.
The way this would work is making it possible to discover, switch and add AT via the accessibility panel of the settings application used in the desktop environment.
Discovery
Discovery would be handled by AT applications adding a field to their .desktop file declaring that they provide AT. The settings app would for example show a list of discovered ATs, and add a way to exchange one AT with another (e.g. change screen reader), or add additional ATs (e.g. braille integration software). Behind the scene, the settings app would configure e.g. the permission store with access rules. Ideally settings apps should handle "confirming" a change, to make sure one doesn't switch to a screen reader that doesn't actually work.
The desktop file and the new field would have no use other than helping with discovery.
The primary way after having installed a new AT would be to go to the accessibility panel of the settings app, and switch to or enable the newly installed AT.
Assisted discovery
There might be desirable to allow a window used for e.g. configuring an AT to assist the user with making it easy to find the Accessibility panel in the Settings app. This could work by for example having a OpenAccessibilitySettings() method on the portal, or a generic portal for opening some particular Settings panel.
Note that, in theory it would be possible for portal backends to implement "give me permission"-dialogs with such method call, but the advice would be not to, given the reasons listed earlier.
Granularity
Having granular access control might be desirable, and doing so is not necessarily more complicated. A DE might want to simplify things and e.g. give access to both unrestricted keyboard shortcuts and the accessibility bus with a single option, while others might want to give more granular control up front. Manipulating the permission store via third party applications (e.g. Flatseal) would be possible if the permission store is used in a portable manner.
Sane defaults
Distributions should be able to pre-install a screen reader, and make it possible to use without any user interaction. With this model, this would be achievable by making sure distributions can pre-populate e.g. the permission store with any installed screen reader application, while ensuring it is launched in a way that makes the portal correctly identify the app ID it has. With the permission store setup for each new user, it should not matter if the screen reader is pre-installed as a flatpak, via a traditional package, or part of an immutable OS image, as long as it is launched correctly.
Development & testing experience
A concern raised with a method like this was the development experience for developing e.g. a screen reader, or testing different ones often; having to interact with Settings in a very manual way can be quite annoying if one has to do it very often.
This can be mitigated by making sure changing permissions possible via scripts. If permission handling is done using the permission store, this should be relatively simple. Improved xdg-desktop-portal documentation about how to run executable from the command line allowing portals to correctly identify the app ID correctly would also make things easy as well, and developers would not need to do much more than just running the executable.
Edit: added part about distribution default.
Thanks you, @jadahl ! This is a great encapsulation of what we discussed. I'm going to create a simple set of calls for the portal here, and see if there are further comments:
Methods
// if this fails, it is expected that the client will call AccessibilitySettings
CreateSession (IN a{sv} options,
OUT o handle);
// set all possible shortcuts this assistive technology will use;
// all shortcuts are disabled by default
SetShortcuts (IN o session_handle,
IN {sa(sa{sv})} shortcuts,
IN s parent_window,
IN a{sv} options,
OUT o request_handle);
// change active shortcuts used by the implementation
// since shortcuts are defined as a dictionary with a string key and list of shortcuts as a value, we can enable and disable them en mass via the keys
// this is seen as a convenience method, since ATs often change hundreds of keybindings within the span of a keystroke.
ChangeActiveShortcuts (IN o session_handle,
IN as enabled_shortcut_lists,
IN as disabled_shortcut_lists,
OUT o request_handle);
// return a list of *all* shortcuts defined via this portal
ListShortcuts (IN o session_handle,
IN a{sv} options,
OUT o request_handle);
// open an implementation defined accessibility settings panel, where additional assistive technologies can be granted permission to use this portal
// the lack of a session handle means this method may be called without the success of CreateSession, and a client will normally run this if RequestSession fails
AccessibilitySettings (IN a{sv} options,
OUT o handle);
// request the name of the global accessibility bus
AccessibilityBus (IN o session_handle,
IN a{sv} options,
OUT o request_handle);
Signals
Activated (o session_handle,
s shortcut_id,
t timestamp,
a{sv} options);
Deactivated (o session_handle,
s shortcut_id,
t timestamp,
a{sv} options);
ShortcutsChanged(o session_handle,
{sa(sa{sv})} shortcuts);
ActivatedShortcutsChanged(o session_handle,
as activated_shortcut_groups,
as deactivated_shortcut_groups);
And finally, the standard properties: version readable u.
Should this be a PR at this poiint? Continue the discussion there?
Questions:
- If the shortcuts can be changed without user approval, are they shortcuts such as "pressing" a button?
- Are shortcuts managed by the system (i.e. the application cannot generate such an event by itself)?
- Are there any other apps that could use this feature other than just for accessibility reasons?
As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that.
As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that.
Yes. The user would be aware of that by virtue of adding the application to the list of accessibility applications (which will be opened by the AccessibilitySettings method). And yes, the accessibility bus is what would allow an AT to read the contents of things of the screen.
Are there any other apps that could use this feature other than just for accessibility reasons?
An application that sets up realtime macro shortcuts could use this. They could set F8 to "bind new macro", then trap all keys, get a combination, followed by a set of keys to reproduce later. Then, set up an action with this protocol that would replay the sequence of keys via some other method. Niche, but not unheard of on other operating systems (Windows).
If the shortcuts can be changed without user approval, are they shortcuts such as "pressing" a button?
I may be misunderstanding the question, so feel free to correct me. Shortcuts are redefined based on context. So for example, the simple fact that a user is inside a document (web or libreoffice) would set different shortcuts than being in a simple GUI application. Being in a text box changes the shortcuts, your focus on certain types of items changes the shortcuts. It is extremely dependent on the current context, and would not generally change because a user "pressed a button".
Are shortcuts managed by the system (i.e. the application cannot generate such an event by itself)?
The AT should not generate an input event, or something which would become a shortcut, no.
If the shortcuts can be changed without user approval, are they shortcuts such as "pressing" a button?
I may be misunderstanding the question, so feel free to correct me. Shortcuts are redefined based on context. So for example, the simple fact that a user is inside a document (web or libreoffice) would set different shortcuts than being in a simple GUI application. Being in a text box changes the shortcuts, your focus on certain types of items changes the shortcuts. It is extremely dependent on the current context, and would not generally change because a user "pressed a button".
Indeed, I was not clear.
Here's an example scenario: The "accessibility" app has access to all content, so knows when you type text, what actions you trigger, and may possibly misread things on purpose. Since it can reassign shortcuts at will, can we imagine that the app can assign the action "delete" or "press the button" (like a delete button) to a shortcut that you use frequently (but which is not the shortcut you defined)?
I suppose that would technically be possible, @Mikenux
Methods
// set all possible shortcuts this assistive technology will use; // all shortcuts are disabled by default SetShortcuts (IN o session_handle, IN {sa(sa{sv})} shortcuts, IN s parent_window, IN a{sv} options, OUT o request_handle);
I think this is likely the only method needed regarding shortcuts, if the intention is for a11y shortcuts to always take precedence without any user interaction. It also means parent_window isn't needed, since there would never be any dialogs.
Might be useful to let the backend communicate what shortcuts it managed to set though, it cannot really be 100% unconditional. It'll depend on implementation abilities and a limited set of combinations (e.g. escape hatch) the compositor might want to have.
Changing between "modes" would just set new shortcuts.
As for accessing the accessibility bus, if that means accessing private information, then the user should be aware of that.
Yes, it'd be a tricky design task to some how educate the user while they are configuring things.
Since it can reassign shortcuts at will, can we imagine that the app can assign the action "delete" or "press the button" (like a delete button) to a shortcut that you use frequently (but which is not the shortcut you defined)?
I imagine A-Z, delete, backspace and enter could perhaps be "shortcuts" that the portal backend can disallow even for an AT, but fundamentally, the possibility that an app disguising itself as an AT can use an a11y portal to do really terrible things is a real problem and hard to solve.
I imagine A-Z, delete, backspace and enter could perhaps be "shortcuts" that the portal backend can disallow even for an AT
This will not be possible. I can't say for sure on Backdpace and enter, but individual characters and Shift+a singular character are very common shortcuts used by a screen reader.
EDIT: I've just confirmed the backspace and enter are also used in some modes of operation.
the possibility that an app disguising itself as an AT can use an a11y portal to do really terrible things is a real problem and hard to solve.
Right now. Any binary can just read and interact with the accessibility layer with no permissions at all. So this will still be massive progress.
It would be better to warn the user when shortcuts are assigned to delete/destructive actions. However, even if it would be possible to detect such actions, these shortcuts must be stable across contexts, and the system screen reader must read them instead of the app's screen reader (or at least give a hint). The same may be true for the "push the button" action, although it could be limited to destructive actions.
The main thing is to avoid any destructive actions. Any other bad but non-destructive behavior (e.g. misreading) is something the user should notice. Therefore, a way to easily disable the problematic app is needed.
It would be better to warn the user when shortcuts are assigned to delete/destructive actions.
What exactly do you mean by destructive actions?
I think this is likely the only method needed regarding shortcuts, if the intention is for a11y shortcuts to always take precedence without any user interaction. It also means parent_window isn't needed, since there would never be any dialogs.
Ah I see. Thanks for the clarification.
Might be useful to let the backend communicate what shortcuts it managed to set though, it cannot really be 100% unconditional. It'll depend on implementation abilities and a limited set of combinations (e.g. escape hatch) the compositor might want to have.
Yes, this is probably a good idea. This would be sent by the ShortcutsChanged signal,.
Changing between "modes" would just set new shortcuts.
The only reason I suggested otherwise is that changing the events would be a fairly large request, potentially in the 1-2KB+ range, since every possible shortcut, with namespaced actions attached could be quite a large list, and it needs to be updated nearly instantaneously for a good experience—I was worried about the round-trip time for such a large piece of data.
Perhaps I'm thinking a bit too low-level for a portal? I'd need input from others on what latencies would be considered acceptable for this. I'm trying to avoid a situation where a user presses two shortcuts close together, and the first one changes what shortcuts are available. Ideally this would never happen, since under the current "the AT grabs all input events" system, this is not possible, which at least has the advantage of always being correct, even if it is an order-of-magnitude less secure.
the system screen reader must read them instead of the app's screen reader (or at least give a hint).
What is your meaning here? I'm not sure I understand what you mean in terms of the distinction between a system screen reader and "an app's screen reader". Generally, a screen reader handles accessibility across all applications on a system, and it is extremely rare for individual apps to have their own "screen readers"—these are generally called "self-voicing" applications, since they do not require a screen reader to function, but it would still not be called a screen reader.
The vast, vast majority of applications rely an external screen readers to provide accessibility, and those that do not generally just require that a user disable their current screen reader to use it.
I used the "destructive" action just to be general, referring to the term used in GNOME. Another destructive action other than "Delete" is "Discard", for example. If it's already communicated, that's fine.
Thank you for the precision on the difference between a screen reader and a "self-voicing" application.
If it's already communicated, that's fine.
Yes, so in this case focusing a button labeled "delete", or "remove", would speak the label of the button. So the user would be aware of what they are doing. Is that what you're trying to say?
Yes. In terms of shortcuts, it also means speak the action. And, if I'm not wrong, if your app has its own feedback (voices, use of refreshable braille display), that's even more important. For example, an app can be marked as sandboxed (in GNOME Software) if the pulseaudio socket is used to play voices (again, if I'm not wrong). That's for a "push button" shortcut.
(if valid) If you have a shortcut for any destructive action (including deletion of a line within a document), the shortcut has to be stable across contexts and set via the portal. Is your portal supposed to work like the global shortcut portal as advertised (so by displaying a dialog box)?
(sorry, I feel like I'm a bit lost)