html icon indicating copy to clipboard operation
html copied to clipboard

Need hook for when a browsing context gains or loses focus

Open tobie opened this issue 8 years ago • 13 comments

This is useful for the Generic Sensor API which needs to stop providing new sensor readings to a browsing context which has lost focus in order to prevent skimming attacks (e.g. inferring a password entered in a different browsing context from device movements captured by a gyroscope).

Reacting to lost focus might already possible to do right now, but it wasn't immediately obvious how to. Apologies if I missed something blatant.

My understanding is WebBluetooth and NFC have similar requirements. Pinging @jyasskin and @kenchris respectively.

tobie avatar May 26 '17 22:05 tobie

NFC tries to use this in https://w3c.github.io/web-nfc/#handling-window-visibility-and-focus. WebAuthn has an issue for it in https://github.com/w3c/webauthn/issues/316, which @jcjones is handling.

jyasskin avatar May 26 '17 22:05 jyasskin

We'd very much appreciate a pull request or a sketch of the spec modifications to be done here that would work for your use cases.

domenic avatar Sep 06 '17 00:09 domenic

Could we make the hook using language from the pointerlock spec's methods? They write:

Pointer lock must not succeed unless the target is in the active document of a browsing context which is (or has an ancestor browsing context which is) in focus by a window which is in focus by the operating system's window manager. The target element and its browsing context need not be in focus.

This could be used to define something beyond just active document to capture that it's the document being actively manipulated by the user.

That'd be the preferred situation for Web Authentication, I think - we don't want an active document in a background window to start an authentication session.

jcjones avatar Nov 15 '17 17:11 jcjones

Note @mikewest - the above might be a useful distinction for CredMan, too.

jcjones avatar Nov 15 '17 17:11 jcjones

This sounds like something credential management could indeed use.

mikewest avatar Nov 15 '17 17:11 mikewest

I'm no longer editing the Generic Sensor spec, so I'm not sure what the current requirements are.

tobie avatar Nov 15 '17 18:11 tobie

Taking a stab at @domenic's request:

We'd very much appreciate a pull request or a sketch of the spec modifications to be done here that would work for your use cases.

I'm working mostly from Page Visibility's visibility states...

==== Add to Document a readonly attribute foregroundState which is an enum ForegroundState:

enum ForegroundState {
    "foreground",
    "background"
};

To Document also add an EventHandler onforegroundstatechange that is an event handler for foregroundState:

partial interface Document {
    readonly attribute ForegroundState foregroundState;
             attribute EventHandler    onforegroundstatechange;
};

Upon getting foregroundState, we'd run the algorithm from Pointerlock to determine if the window manager has this window in focus.

====

WebAuthn would then check foregroundState on the way into its methods, failing if not "foreground", and would register for onforegroundstatechange and cancel if it changes to not be "foreground" during the execution of our parallel algorithms.

jcjones avatar Nov 16 '17 23:11 jcjones

Hmm, I thought the request was about a specification-level hook, not a public API that would require implementers to start exposing new stuff to JavaScript?

domenic avatar Nov 17 '17 00:11 domenic

That is true, I'm just not well versed on how to do that. Would it just be declaring a definition?

jcjones avatar Nov 17 '17 00:11 jcjones

Something like:

==== 6.4.X Determining if the Document is in the Foreground of the Window Manager

To determine if a Document is in the Foreground of the Window Manager, run these steps:

{{ The algorithm from Pointerlock }}

====

In this case, Web Authentication would probably have some language like, "Monitor whether the Document is in the Foreground of the Window Manager and reject the Promise .... if not". Would that ... work?

jcjones avatar Nov 17 '17 00:11 jcjones

Thanks for putting in the effort, I think I can start to help from this. A couple issues with what you've got so far:

First, it would help if you outlined what part of what you linked to you were thinking of including. It's a lot of text, and I don't see a real algorithm in there. Is it just the sentence "the active document of a browsing context which is (or has an ancestor browsing context which is) in focus by a window which is in focus by the operating system's window manager."? That seems not great, given how it relies on undefined concepts like window manager. Also, it implies that background tabs are focused (since e.g. my Firefox window currently has focus from the OS's window manager, despite only one of 20 tabs being actually focused). I think we instead want to say something about how user agents can define the concept of the currently-focused top-level browsing context, and then give some explanation about how this ties into tabbed interfaces, window managers, popup windows, etc.

Second, we can't just use magic like "monitor whether something is true". Think about how you'd write this in software. You'd need to find the point at which something becomes true or false, and then invoke a function. That's the kind of hook we're talking about here. So we'd need to create steps like "When the user agent changes its choice of currently-focused top-level browsing context, run the following steps..." where those steps loop over all documents (browsing contexts? Windows? Which is more useful for your use cases?) that got un-focused and all documents that got newly-focused and runs some hook. We'll need a good name for that hook that ties into the above concept naming.

BTW I'm currently thinking of this as a new section underneath https://html.spec.whatwg.org/multipage/interaction.html#focus, probably at the bottom, defining and describing this new concept of "Top-level browsing context focus". Although I'm wondering if maybe we shouldn't overload the word "focus" and instead use some new word like "choice" or "foreground" or something.

domenic avatar Nov 17 '17 00:11 domenic

That all makes sense to me; spec-fu is still a foreign language to me, and I'm not well-versed on how other similar concepts work in HTML, so thanks!

So we'd probably want to iterate over all browsing contexts that change state and run a hook, I think? I'm obviously very green at this, but it seems more ambiguous to do this to all windows and then have to filter down to the actual documents.

Re: naming, while avoiding 'window manager' .... I don't have anything right now, but I'll ponder on it.

Thanks for dealing with my flailing with good humor, Domenic!

jcjones avatar Nov 17 '17 02:11 jcjones

Giving a bit more context around our generic-sensor use case. As mentioned, we want to be able to stop the sensor from reporting data as soon as the user is entering data on anything else than the web page of the active sensor. Again, this is because sensors like gyroscopes can be used relatively effectively to steal data entered elsewhere (e.g. passwords, credit card numbers, etc.). See for example this 2011 article on this topic. In practice, this means nested iframes (e.g. when using an embedded third party to carry out payment), other tabs or windows (again when using third party payment solutions), browser chrome or browser extensions (e.g. when relying on the user agent's password manager or a third party password manager), or other applications altogether.

It's worth noting that there's currently no consistency in how browsers report pages loosing focus to other applications. That is, on some browsers the web page is considered to no longer have focus, while on other browsers this is not the case and the page continues to be considered as focused despite the browser having moved to the background. I'm also not sure that all browsers consistently unfocus pages when the user focuses on browser chrome and/or browser extensions. That prevents the sensor from being deactivated in such cases, which has security consequences.

So ideally, that focus involves both the page and app being in focus (but not privileged chrome) should be explicit in the spec.

tobie avatar Nov 17 '17 08:11 tobie