html icon indicating copy to clipboard operation
html copied to clipboard

[focus-without-user-activation] Allow focus if a descendant has focus

Open ffiori opened this issue 5 months ago • 19 comments

Adding a step in 'allow focus steps' to check if any of the inclusive descendant frames of the caller's frame is currently focused, then return true.

This part of the spec was missing after the resolution during TPAC 2024 in WHATWG meeting: https://github.com/w3c/webappsec-permissions-policy/issues/273#issuecomment-2384287101

where it was resolved that "Focus delegation should also be allowed (allow parent frame programmatically set focus into child iframe)".

Informally speaking, with this change the 'allow focus steps' end up looking like this:

algorithm allow_focus(focus_setter_frame, target, currently_focused_frame):
  if focus_setter_frame has the policy allowed:
    return true
  if the user initiated the action (target's frame has transient activation):
    return true
  if currently_focused_frame is an inclusive descendant frame of focus_setter_frame:
    return true
  return false

See the previous spec PR for this permissions policy for more details: https://github.com/whatwg/html/pull/10672.


/acknowledgements.html ( diff ) /interaction.html ( diff )

ffiori avatar Jul 31 '25 23:07 ffiori

Thanks @dandclark! @annevk could you PTAL? Or let me know who else could review this? I don't have permissions to add reviewers.

ffiori avatar Aug 05 '25 17:08 ffiori

@annevk friendly ping on this PR, I'd appreciate if you could take a look when you have some time :)

ffiori avatar Aug 13 '25 20:08 ffiori

I don't understand this.

Let's say you have top-level page A, hosting iframe B, which in turn hosts iframe C.

Top-level page A has decided not to allow iframe B to focus. So code in iframe B which calls element.focus() should do nothing, and not steal focus.

But then iframe B can work around this, whenever it or its children have focus? Why do we let iframe B override the wishes of top-level page A in this way?

Can you give a realistic example of when this is desired? I read through both https://github.com/w3c/webappsec-permissions-policy/issues/273#issuecomment-2384287101 and https://github.com/whatwg/html/pull/10672 and cannot find any motivation for, or agreement on, this change.

The closest is the resolution to "allow parent frame programmatically set focus into child iframe", but that is not what this PR does. This PR lets the child frame override the parent frame's wishes; it doesn't allow the parent frame to focus the child.

domenic avatar Aug 20 '25 07:08 domenic

@domenic thanks for having a look! I've been reading all the old discussions, let me see if I'm misunderstanding the intended behavior:

Let's say we have top-level frame A, hosting iframe B, which hosts iframe C, and B and C have the policy denied. And let's say A moves focus to B. Once B has focus I think it makes sense for B to be able to move focus inside itself as it wants because it's not "stealing" focus from its parent or other frames anymore, right? I feel like a realistic example of this could be any webpage that moves focus from one element to another with .focus() and it's hosted in an iframe, which wouldn't be harmful.

Just to further clarify some behaviors, I have a PR in review to update the explainer here https://github.com/w3c/webappsec-permissions-policy/pull/574 in which I try to capture all corner cases and old discussions with some pseudocode:

algorithm is_allowed_to_set_focus(focus_setter_frame, currently_focused_frame):
  if focus_setter_frame has the policy allowed:
    return true
  if currently_focused_frame is an inclusive descendant frame of focus_setter_frame:
    return true
  return false

Let me know if we're more or less on the same page about this.

By the way, I also realized my change here is not right, it doesn't work for the case where A hosts iframes B and C (they're siblings), focus is on B, and A tries to focus C instead. I think A should be able to do that but, according to the spec, C is target and none of its inclusive descendants have focus. I think the spec should look at the inclusive descendants of the focus setter frame instead (in this case, A), just like the pseudocode above. I still need to figure out how to write this with spec words, but wanted to make sure we agree on the desired behaviors first.

ffiori avatar Aug 22 '25 21:08 ffiori

Once B has focus I think it makes sense for B to be able to move focus inside itself as it wants because it's not "stealing" focus from its parent or other frames anymore, right? I feel like a realistic example of this could be any webpage that moves focus from one element to another with .focus() and it's hosted in an iframe, which wouldn't be harmful.

I'm not sure. It depends on the original intent of the proposal. If it was to prevent malicious third-party frames from moving the user's focus around without user activation, then just the fact that it got focus once is not a good license for allowing further focus movements. But, if the intent is some sort of belief that once the user has given user activation a single time, that proves the subframe trustworthy, then maybe it is OK.

I also think there's a significant difference between allowing a frame to move focus within itself, and allowing it to move focus within child iframes. Especially child iframes which the parent frame has explicitly disallowed. That gives another workaround. E.g. consider the permissions policy "allow focus-without-user-activation from all sites except https://evil.example/. All evil.example has to do in this case to bypass the policy is create a small wrapper frame at https://evil2.example/, which then hosts the https://evil.example/ frame, and the policy has become useless. That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

Do you know of specific sites that need these changes to the current policy? Otherwise, I think being more conservative might make sense.

domenic avatar Aug 26 '25 06:08 domenic

@domenic thanks for your comments! I'll reply inline.

I'm not sure. It depends on the original intent of the proposal. If it was to prevent malicious third-party frames from moving the user's focus around without user activation, then just the fact that it got focus once is not a good license for allowing further focus movements. But, if the intent is some sort of belief that once the user has given user activation a single time, that proves the subframe trustworthy, then maybe it is OK.

I think the main idea for the policy was to "prevent frames from stealing focus without the user noticing or without the user's consent". https://github.com/w3c/webappsec-permissions-policy/issues/273#issue-404248888

I also think there's a significant difference between allowing a frame to move focus within itself, and allowing it to move focus within child iframes. Especially child iframes which the parent frame has explicitly disallowed. That gives another workaround. E.g. consider the permissions policy "allow focus-without-user-activation from all sites except https://evil.example/. All evil.example has to do in this case to bypass the policy is create a small wrapper frame at https://evil2.example/, which then hosts the https://evil.example/ frame, and the policy has become useless. That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

Is this example actually possible? Permissions policies work with whitelists instead of forbidden lists, right? As in allow="focus-without-user-activation a.com b.com". So if the author is whitelisting a site, an iframe hosting this site should be able to "steal" focus and move it around itself as it wants, even to subframes. I understand that if you whitelist a site, you trust this site is not gonna act as a wrapper for a malicious one. Let me know if I'm missing a way to set a policy to be allowed for all sites except evil.com.

Do you know of specific sites that need these changes to the current policy? Otherwise, I think being more conservative might make sense.

I'm not aware of specific sites that I could cite, but seems to me that this could be breaking any site that moves focus from one element to another and is hosted in an iframe. Anyways, I filed an issue in the WebAppSecWG hoping to bring it to the attention of developers or people who might have more info on this question. Probably moving this discussion there is better for more visibility instead of continuing it in this PR.

Also let me know if WebAppSecWG is the right place to file an issue about this policy and discuss it. I've also seen some issues filed in WHATWG/html, so wasn't super sure which one is more suitable.

ffiori avatar Sep 03 '25 19:09 ffiori

Is this example actually possible? Permissions policies work with whitelists instead of forbidden lists, right?

You're right my exact example is not possible. However, my larger point stands, even with an allowlist approach:

That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

domenic avatar Sep 04 '25 00:09 domenic

However, my larger point stands, even with an allowlist approach:

That is not how any other permission policy works, e.g., if evil.example is denied from using WebUSB, then a wrapper frame is not enough to allow the inner frame to use WebUSB.

Hmm, I feel like we might be thinking of the policy in different ways. I think the main idea for the policy was to "prevent frames from stealing focus from other frames" (w3c/webappsec-permissions-policy#273 (comment)) instead of "prevent frames from using focus APIs". With the former in mind, the wrapper frame counterexample doesn’t really apply because the outer frame already passed focus to evil2.example, which then passes focus to evil.example, so no frame is stealing focus from other frames there. Even if evil.example is denied from the policy, the wrapper frame is not really enabling evil.example to use the policy since it's not stealing focus from other places.

As a similar example, if you go to outlook.com and click on the To Do icon, it loads an iframe with the To Do app, the top frame passes focus to this iframe, but then this iframe focuses the input field "Add a task". Under the "prevent frames from using focus APIs" model, this wouldn’t work unless the iframe had the policy explicitly allowed. (Pasting a screenshot below to describe this better)

Also, now that we're discussing this, the policy name might be misleading. The last TPAC resolution “Focus delegation should also be allowed” means that a parent frame should be able to programmatically set focus into a child iframe even without user activation. And that behavior should be preserved even when the policy is disabled. So maybe something like focus-steal-without-user-activation would better capture the intent? I'm open to discuss more suitable names here.

image

ffiori avatar Sep 08 '25 21:09 ffiori

To expand on this:

Do you know of specific sites that need these changes to the current policy?

Besides the example for the To Do app in Outlook I mentioned above this comment, I also confirmed with a customer of ours (Microsoft Teams) that this less conservative behavior (the one this PR proposes) is needed for them to be able to use the permissions policy.

As an example of this, I'm pasting a screenshot below where the user opens the Microsoft Copilot app (loaded in iframe B) inside Teams (iframe A). Teams wants to focus on the app (B), which in turn wants to move focus to the input element at the bottom of it (the one that says "Message Copilot"). Teams wants to deny the permissions policy on B so it doesn't steal focus if let's say the user starts typing something in the search bar at the top while B is still loading, and then when it finishes loading it tries to focus its input bar. If the policy prevented all use of programmatic focusing APIs inside it even when it's focused, then this experience would break.

image

I'm pasting another screenshot below where there are 3 frames, A hosting B hosting C. Teams would like to disable the policy on B for similar reasons as the previous example. If the user just clicks on the Engage icon on the left and waits, the app loads B and A moves focus to B. Now if B wants to focus the video that's in iframe C as soon as it gets focus, it should be able to do that. If we choose the stricter behavior, this experience is broken as well.

image

There are more examples like these in other M365 products like Outlook (the one above this comment), OneNote, Word/PowerPoint/Excel and more. Choosing the stricter behavior would be breaking lots of these sites, and pretty sure the same with similar ones outside of this ecosystem.

ffiori avatar Sep 24 '25 23:09 ffiori

Discussed in https://github.com/whatwg/html/issues/11696, feel free to re-add agenda+ when ready to discuss again.

cwilso avatar Sep 26 '25 20:09 cwilso

@annevk, thanks for your comments so far. I'd like to make sure we're on the same page about behaviors before continuing the discussion on the technical details of the spec.

So far there's high level agreement on the Permissions Policy: there's support from WebKit and a satisfied TAG review. There's also a merged spec PR on this repo. So there's only this piece of behavior that would need to be resolved before the feature is in a good state for finishing implementations and proposing shipping.

As I mentioned in this comment, we got back to our customer Microsoft Teams and talked about the corner case that came up during 2025-09-25 WHATNOT https://github.com/whatwg/html/issues/11696: A hosting B hosting C, B and C have the policy denied, C is focused, B tries to move focus somewhere else. This PR would allow that to happen. Teams supports this behavior too, arguing that there might be apps relying on this, and that this wouldn't really constitute a security concern because:

  1. B could have other mechanisms to regain focus (deleting C for example).
  2. B could trick the user into typing inside an element that belongs to B (for example with a transparent div on top of C's input element).
  3. C could prevent this by using CSP frame-ancestors to avoid being embedded by B.

The fact that some webpages might be counting on behaviors like the case discussed here is the original motivation for this PR. It would try to avoid breaking existing sites that are embedded with this policy denied so it can be more easily adopted, while still fulfilling the market need for a policy that prevents frames from stealing focus.

Just to further clarify the proposal, I added this pseudo algorithm to the description, trying to capture all possible cases of the 'allow focus steps':

algorithm allow_focus(focus_setter_frame, target, currently_focused_frame):
  if focus_setter_frame has the policy allowed:
    return true
  if the user initiated the action (target's frame has transient activation):
    return true
  if currently_focused_frame is an inclusive descendant frame of focus_setter_frame:
    return true
  return false

(the current cases being discussed would fall into the third 'if' statement above, the rest of the algorithm looks like it's currently spec'd as of now)

cc @ydogandjiev @taylore-msft

ffiori avatar Oct 23 '25 18:10 ffiori

Wouldn't the proposal be rather problematic with fullscreen? First user triggers fullscreen on C and browser tells about C being in fullscreen. Then B steals focus from C and all the keyboard events go to B? Or am I missing something (I very well could be)?

smaug---- avatar Nov 21 '25 13:11 smaug----

Wouldn't the proposal be rather problematic with fullscreen? First user triggers fullscreen on C and browser tells about C being in fullscreen. Then B steals focus from C and all the keyboard events go to B? Or am I missing something (I very well could be)?

Hey @smaug----, the intent of this feature is not to protect child frames from their parent frames. There are existing mechanisms that websites/webapps can use to prevent themselves from being iframed by untrusted origins (e.g. CSP frame-ancestors, X-Frame-Options). If C didn't trust B then it wouldn't allow itself to be iframed by it. The intent of this feature is to give website/webapp developers full control over focus when choosing to render subsets of the experience using embedded frames (e.g. Teams Platform Apps, ChatGPT Apps, etc.).

ydogandjiev avatar Nov 24 '25 16:11 ydogandjiev

Where is it documented that the current spec'ed behavior is not the intent of the feature? (other than here in this pr the proposal to change the behavior). What is requested here isn't about focus delegation, but focus stealing from descendant.

smaug---- avatar Nov 25 '25 09:11 smaug----

Hey @smaug----, this change has been discussed several times in WHATNOT meetings and it's intended to address an edge case not considered in the original spec/implementation. Once @ffiori is back from his break, he will bring it up again in the next one to ensure there is alignment with all stakeholders.

As far as I can tell, the original intent of this feature was to protect apps running in the top-level window from child frames stealing their focus. It does not prevent parents from taking that focus back. As currently implemented, an app running in the top-level window can always take focus back from a child frame. Now if this same app running in the top-level window gets embedded in an iframe, its focus logic will break because it can no longer take focus back from its children. That is what we are trying to fix here and ensure consistency. Ultimately, a parent window/frame can always force focus back to itself by either destroying the child iframe or even using an overlay to capture user input (i.e. click-jacking) so I don't believe we should be trying to prevent that with this feature.

ydogandjiev avatar Nov 26 '25 17:11 ydogandjiev

Yes, I've attended probably all those WHATNOT meetings ;). I brought up a possible issue here and I expect that someone will either explain why it is not a problem, or tweak the PR. Fullscreen is a special case and we need to be careful with it.

smaug---- avatar Nov 26 '25 21:11 smaug----

I'm back :)

Thanks @ydogandjiev for summarizing the context, and thanks @smaug---- for your interest in the policy.

Regarding the fullscreen case you mentioned: if the user triggers fullscreen on C, then B is considered to have user activation because it’s an ancestor of C in the same activation chain. As a result, B can take focus since the policy allows focus when there’s user activation (see item 2 in https://html.spec.whatwg.org/#allow-focus-steps).

Also, I’d like to point you to https://github.com/whatwg/html/issues/11839, where I explain the reasoning behind this approach and the problems it addresses. There’s support from different developers there as well. @smaug----, would you change anything in the proposal?

ffiori avatar Nov 26 '25 23:11 ffiori

Closing and reopening to see if the PR preview makes a new version.

zcorpan avatar Dec 11 '25 14:12 zcorpan

Hmm, that fullscreen is possibly still an issue. allow-focus-steps is using transient user activation. So with the proposed new behavior user might happily watch an embedded video for awhile, and then at some point, even though user hasn't interacted with the page at all, focus gets moved to some ancestor document, no? (I think fullscreen might per spec have other issues too, because of that transient user activation propagation, unrelated to this pr, but this pr is possibly making them worse). Again, please correct me if I've missed something about fullscreen handling.

And yes, I do understand why folks might want behavior similar to what the PR is proposing, but as a reviewer part of my job is to ensure this doesn't cause new issues.

Also, note, the (diff) link in the pr has been linking to the older version of the patch and "Files changed" to the new version and that certainly confused me, though it doesn't affect my worry about fullscreen. (See discussing in #whatwg Matrix channel).

(Unfortunately I can't attend Dec 11 WHATNOT)

smaug---- avatar Dec 11 '25 14:12 smaug----

@smaug---- thanks, I think I understand the concern.

In the A->B->C scenario with the policy denied for B and C, this proposal intentionally allows a frame to move focus within its own subtree when focus is already there, even after transient activation expires (the goal being compatibility/ergonomics). That means B could retake focus from fullscreen C without a new user gesture.

If that’s undesirable, it feels more like a fullscreen+focus invariant question than something to encode as a special case in this permissions policy. Do you have a preferred invariant (e.g. moving focus away from the fullscreen element should exit fullscreen / be disallowed while fullscreen is active)? I didn’t find explicit guidance in the fullscreen specs or the HTML standard, and this open issue seems related: https://github.com/whatwg/fullscreen/issues/108.

And yes, I do understand why folks might want behavior similar to what the PR is proposing, but as a reviewer part of my job is to ensure this doesn't cause new issues.

Of course, I understand and I appreciate you digging into these edge cases. Thanks!

ffiori avatar Dec 15 '25 22:12 ffiori