html icon indicating copy to clipboard operation
html copied to clipboard

Transient activation and focus

Open eladalon1983 opened this issue 7 months ago • 9 comments

What is the issue with the HTML Standard?

A question about transient activation came up in the WebRTC WG during a discussion about screen-sharing.getDisplayMedia() returns a promise, and then if the user chooses to share a tab/window/screen, that promise is resolved. We have recently specified that when resolving that promise, the UA will also consider the document to have regained transient activation.

  • In the case of Chromium’s implementation, it is virtually guaranteed that at the critical moment, the document “has focus.” (I use “focus” imprecisely here, as a layman.) This is because Chromium implements the dialog as attached to the tab from which the call originated.
  • In the case of Safari’s implementation, if the user chooses to share a window, then Safari might be obscured by the chosen window at that final, critical moment.

Our question is: Do you see any problem in Safari’s case here? Would conferring transient activation on a document that is not “focused” pose any issues? If so, how may we mitigate them in the spec?

Thanks, Elad

eladalon1983 avatar May 22 '25 21:05 eladalon1983

CC @jan-ivar @youennf @marcoscaceres

eladalon1983 avatar May 22 '25 21:05 eladalon1983

@mustaqahmed should probably take a look.

Overall I don't think other specifications should set the last activation timestamp directly because then the concept will be very hard to understand. Instead we probably need to come up with some abstraction that WebRTC could adopt. (Which might come down to a similar thing, but then at least the entirety of the flow is clear from HTML.)

annevk avatar May 23 '25 06:05 annevk

I agree with @annevk here: any new/renewed/extended transient activation through anything other than a user interaction would make user activation gated APIs abusable from JS!

mustaqahmed avatar May 23 '25 20:05 mustaqahmed

When calling getDisplayMedia, the UA will display a UA prompt so that user selects the surface to capture. This is based on this user selection action that activation would happen just before resolution of the promise.

youennf avatar May 25 '25 19:05 youennf

As Youenn has mentioned, and as I have commented to @mustaqahmed on the PR - there appears to be a misunderstanding here. Calling the JS-exposed API does not in itself update the timestamp. Rather, calling the JS-exposed leads the UA to show the user a permission prompt, and the user's interaction with that prompt is what could update the timestamp. (But only if the user accepted the prompt; not if the user rejected or dismissed the prompt.)

Further, note that the user accepting the prompt is not a common occurrence. It implies the calling Web application is now capturing the user's screen (or another surface like a window) - a much more alarming outcome than getting transient activation. That is, we are already assuming some trust here.

eladalon1983 avatar May 26 '25 09:05 eladalon1983

Thanks for clarifying (here and through GVC): we essentially want to "forward" the user interaction on a browser surface (which is "the permission prompt" in this case) to the related renderer here. I missed the user interaction on the permission prompt itself! I am taking back my main concern.

@annevk's concern above still applies, and I would suggest adding to this list something like "a trusted ping" representing user interaction on a connected browser-dialog.

@annevk:

  • Does it sound like the abstraction you suggested?
  • Is there a link to the WebRTC case you mentioned?

mustaqahmed avatar May 26 '25 21:05 mustaqahmed

we essentially want to "forward" the user interaction on a browser surface (which is "the permission prompt" in this case) to the related renderer here.

Yes, though the last user interaction may be on the captured surface itself in case of macOS picker for instance. See below screenshot: capturing surface is on the right and captured surface (a terminal window) on the left. Image

The two questions we have:

  1. Editor's question on how to best activate the user interaction when resolving getDisplayMedia promise by referring to some algorithm.
  2. Is there a corner case where triggering user interaction would be problematic? For instance, the capturing context may be fully occluded when selecting the captured surface/when resolving getDisplayMedia promise.

youennf avatar May 27 '25 08:05 youennf

Thanks for clarifying, @youennf. To try and further clarify - there is a difference between Chrome and Safari here, and those who are only familiar with one might not be aware of how the other works. When sharing another window:

In Chrome, the user's last interaction is with a Chrome prompt clearly associated with the relevant Chrome tab (the one which called getDisplayMedia). When the user clicks the "share" button, right before the UA confers transient activation on the Web app, that Web app is visible to the user, who has just interacted with something clearly associated with it (the app). See image below.

Image

In Safari, the user's last interaction is with an OS-level overlay, and this means that another windows - especially the to-be-captured window - might occlude the capturing Web application from the user's view, either partially or fully. See image below.

Image

I believe the question boils down to - should the spec note that transient activation should not be conferred if the capturing Web app becomes too occluded? Or just unfocused? Or is this fine?

eladalon1983 avatar May 27 '25 09:05 eladalon1983

I believe the question boils down to - should the spec note that transient activation should not be conferred if the capturing Web app becomes too occluded? Or just unfocused? Or is this fine?

We need to be careful with this, "forwarding" user activation means allowing access to all user activation-gated APIs, which can sometimes confuse users when those actions are triggered from a background window (e.g., opening a popup or a file picker).

EdgarChen avatar Jun 16 '25 20:06 EdgarChen