mediacapture-main
mediacapture-main copied to clipboard
Broken foreground detection
This spec references "focus" in 8 places, e.g.: "The User Agent MUST wait to proceed to the next step until the relevant settings object's responsible document is fully active and has focus."
All are meant to ensure camera & microphone cannot be turned on from background tabs, but it doesn't work:
- focus is the wrong algorithm (meant for elements)
- We never intended to require iframe focus (we want the top-level browsing context instead)
- HTML doesn't detect system-level focus changes https://github.com/whatwg/html/issues/5049.
Other specs are in the same boat.
We need to fix https://github.com/whatwg/html/issues/5049 and use the following algorithm instead of "and has focus":
- "...and satisfies the has focus steps."
- "...and satisfies the has focus steps."
This may not cut it either, as 3 out of 4 browsers return false while the user is in the URL bar, which shouldn't delay camera IMHO.
Worse, only 1 (Firefox) out of 4 browsers appears to care about focus at all: https://jan-ivar.github.io/dummy/gum_visiblefocus.html
Several specs seem in need of a similar "visible and focused" step in HTML, but it may need to be a new one.
It's probably a stretch to call this editorial, since behavior varies in implementations.
@eladalon1983 wrote in https://github.com/w3c/mediacapture-screen-share/pull/192 something I think is relevant to getUserMedia:
If the tab is visible but unfocused (for example, two browser windows visible on the screen side by side), this would produce the difference of not invoking a prompt on the unfocused browser window+tab (until focused). ... Are we sure this is really preferable?
It's overly strict in that particular case, which comes up more for getUserMedia than getDisplayMedia which requires user activation (and thus focus).
This spec mandates (keyboard) focus ahead of prompting, when it might suffice that the requesting document's tab is the foreground tab in that window.
When I tested this, Safari appeared to have a good solution that technically violates the spec: it prompts if the requesting document's tab is the foreground tab, regardless of focus. Its prompt is also clearly associated with the document.
The spec should probably allow this. This suggests two tests: A "foreground" visibility test ahead of prompting, and a "foreground" + focused test before resolving, to preserve the no-prompt case.
- We never intended to require iframe focus (we want the top-level browsing context instead)
I guess this is the same for screen sharing. I filed https://github.com/w3c/mediacapture-screen-share/issues/203
Pinging @palak8669
This suggests two tests: A "foreground" visibility test ahead of prompting, and a "foreground" + focused test before resolving, to preserve the no-prompt case.
I think this makes sense for getUserMedia. On desktop, it seems comforting to know that other browser windows that weren't using camera or microphone when I've left them open cannot decide to turn on camera or microphone on a whim while I'm not interacting with them.
But what would this mean for enumerateDevices? Right now, Firefox's check in enumerateDevices is:
if (!bc->IsActive() || // background tab or browser window fully obscured
!bc->GetIsActiveBrowserWindow()) { // browser window without focus
IOW, the same page visibility AND focus of the user agent window (not the document) check.
While a focus requirement seems defensible for getUserMedia, perhaps the visibility requirement alone is sufficient for enumerateDevices? There it's anti-fingerprint, not anti-spying. @karlt @youennf @martinthomson @jesup Thoughts?
What a operating-system-window focus requirement provides enumerateDevices() is that there would (usually) be only one focused browsing context hierarchy on a system.
With only a visibility test, there would often be more than one "visible" top-level browsing context on a user's desktop, allowing fingerprinting across origins. Visibility is typically not strict, and so a browsing context is typically considered visible even when fully occluded by another system window, and some desktop systems do not promote minimization of inactive windows.
The disadvantage of the focus requirement is that sometimes the presence of a device is useful for displaying items that would be visible before any user interaction.
Visibility seems the preferred requirement for enumerateDevices() and "devicechange" if fingerprinting exposure can be comparable to focus.
For example, if delaying the exposure of device changes by returning an old list of devices for a long enough unpredictable period of time would reduce the correlation between origins sufficiently, then the list of devices would at least be available and accurate when the devices haven't changed recently.
Does the platform already expose whether the current window has focus? I assume that it does, but want to confirm.
Other than that, I think Karl’s argument resonates with me. The fingerprinting risk exists if the information is released under any focus condition, so we are really looking at what makes the API useful.
Operating system window focus is exposed through "focus" and "blur" events iff the user-agent is directing keyboard events to the navigable. If the user-agent is taking keyboard events for its own widgets, then these "focus" and "blur" events are not dispatched.
Gecko exposes window focus, even when the user-agent is directing keyboard events to its own widgets, through :-moz-window-inactive, but I'm not aware of any standardized APIs already doing this.
Does the platform already expose whether the current window has focus?
If you're in an iframe without focus, then there's no way to tell whether the current window has focus or not AFAIK.
...if fingerprinting exposure can be comparable to focus. ... For example, if delaying the exposure ...
Agreed. I see no time limit where the spec says: "User Agents MAY add fuzzing on the timing of events to avoid cross-origin activity correlation".