mediacapture-main Broken foreground detection

trafficstars

This spec references "focus" in 8 places, e.g.: "The User Agent MUST wait to proceed to the next step until the relevant settings object's responsible document is fully active and has focus."

All are meant to ensure camera & microphone cannot be turned on from background tabs, but it doesn't work:

focus is the wrong algorithm (meant for elements)
We never intended to require iframe focus (we want the top-level browsing context instead)
HTML doesn't detect system-level focus changes https://github.com/whatwg/html/issues/5049.

Other specs are in the same boat.

We need to fix https://github.com/whatwg/html/issues/5049 and use the following algorithm instead of "and has focus":

"...and satisfies the has focus steps."

Nov 22 '20 01:11 jan-ivar

"...and satisfies the has focus steps."

This may not cut it either, as 3 out of 4 browsers return false while the user is in the URL bar, which shouldn't delay camera IMHO.

Worse, only 1 (Firefox) out of 4 browsers appears to care about focus at all: https://jan-ivar.github.io/dummy/gum_visiblefocus.html

Several specs seem in need of a similar "visible and focused" step in HTML, but it may need to be a new one.

Dec 09 '20 20:12 jan-ivar

It's probably a stretch to call this editorial, since behavior varies in implementations.

Jan 11 '21 15:01 jan-ivar

@eladalon1983 wrote in https://github.com/w3c/mediacapture-screen-share/pull/192 something I think is relevant to getUserMedia:

If the tab is visible but unfocused (for example, two browser windows visible on the screen side by side), this would produce the difference of not invoking a prompt on the unfocused browser window+tab (until focused). ... Are we sure this is really preferable?

It's overly strict in that particular case, which comes up more for getUserMedia than getDisplayMedia which requires user activation (and thus focus).

This spec mandates (keyboard) focus ahead of prompting, when it might suffice that the requesting document's tab is the foreground tab in that window.

When I tested this, Safari appeared to have a good solution that technically violates the spec: it prompts if the requesting document's tab is the foreground tab, regardless of focus. Its prompt is also clearly associated with the document.

The spec should probably allow this. This suggests two tests: A "foreground" visibility test ahead of prompting, and a "foreground" + focused test before resolving, to preserve the no-prompt case.

Sep 17 '21 00:09 jan-ivar

We never intended to require iframe focus (we want the top-level browsing context instead)

I guess this is the same for screen sharing. I filed https://github.com/w3c/mediacapture-screen-share/issues/203

Jan 10 '22 17:01 youennf

Pinging @palak8669

Jan 24 '22 14:01 alvestrand

This suggests two tests: A "foreground" visibility test ahead of prompting, and a "foreground" + focused test before resolving, to preserve the no-prompt case.

I think this makes sense for getUserMedia. On desktop, it seems comforting to know that other browser windows that weren't using camera or microphone when I've left them open cannot decide to turn on camera or microphone on a whim while I'm not interacting with them.

But what would this mean for enumerateDevices? Right now, Firefox's check in enumerateDevices is:

    if (!bc->IsActive() ||  // background tab or browser window fully obscured
        !bc->GetIsActiveBrowserWindow()) {  // browser window without focus

IOW, the same page visibility AND focus of the user agent window (not the document) check.

While a focus requirement seems defensible for getUserMedia, perhaps the visibility requirement alone is sufficient for enumerateDevices? There it's anti-fingerprint, not anti-spying. @karlt @youennf @martinthomson @jesup Thoughts?

Oct 27 '22 16:10 jan-ivar

What a operating-system-window focus requirement provides enumerateDevices() is that there would (usually) be only one focused browsing context hierarchy on a system. With only a visibility test, there would often be more than one "visible" top-level browsing context on a user's desktop, allowing fingerprinting across origins. Visibility is typically not strict, and so a browsing context is typically considered visible even when fully occluded by another system window, and some desktop systems do not promote minimization of inactive windows.

The disadvantage of the focus requirement is that sometimes the presence of a device is useful for displaying items that would be visible before any user interaction.

Visibility seems the preferred requirement for enumerateDevices() and "devicechange" if fingerprinting exposure can be comparable to focus. For example, if delaying the exposure of device changes by returning an old list of devices for a long enough unpredictable period of time would reduce the correlation between origins sufficiently, then the list of devices would at least be available and accurate when the devices haven't changed recently.

Oct 31 '22 05:10 karlt

Does the platform already expose whether the current window has focus? I assume that it does, but want to confirm.

Other than that, I think Karl’s argument resonates with me. The fingerprinting risk exists if the information is released under any focus condition, so we are really looking at what makes the API useful.

Oct 31 '22 09:10 martinthomson

Operating system window focus is exposed through "focus" and "blur" events iff the user-agent is directing keyboard events to the navigable. If the user-agent is taking keyboard events for its own widgets, then these "focus" and "blur" events are not dispatched.

Gecko exposes window focus, even when the user-agent is directing keyboard events to its own widgets, through :-moz-window-inactive, but I'm not aware of any standardized APIs already doing this.

Oct 31 '22 18:10 karlt

Does the platform already expose whether the current window has focus?

If you're in an iframe without focus, then there's no way to tell whether the current window has focus or not AFAIK.

...if fingerprinting exposure can be comparable to focus. ... For example, if delaying the exposure ...

Agreed. I see no time limit where the spec says: "User Agents MAY add fuzzing on the timing of events to avoid cross-origin activity correlation".

Nov 03 '22 20:11 jan-ivar

mediacapture-main mediacapture-main copied to clipboard

Broken foreground detection

mediacapture-main
mediacapture-main copied to clipboard