mediacapture-screen-share Conditional Focus (When Display-Capture Starts)

Problem

When an application starts capturing a display-surface, the user agent faces a decision - should the captured display-surface be brought to the forefront, or should the capturing application retain focus.

The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore ill-positioned to make an informed decision.

In contrast, the capturing application is familiar with its own properties, and is therefore better suited to make this decision. Moreover, by reading displaySurface and/or using Capture Handle, the capturing application can learn about the captured display-surface, driving an even more informed decision.

Sample Use Case

For example, a video conferencing application may wish to:

Focus a captured application that users typically interact with during the call, like a text editor.
Retain for itself focus when the captured display-surface is non-interactive content, like a video.
- (Using Capture Handle, the capturing application may even allow the user to remotely start/pause the video.)

Sample Code

const stream = await navigator.mediaDevices.getDisplayMedia();
track.focus(ShouldFocus(stream) ? "focus-captured-surface" : "no-focus-change");

function ShouldFocus(stream) {
  const [track] = stream.getVideoTracks();
  if (sampleUsesCaptureHandle) {
    // Assume logic discriminating focusability by origin,
    // for instance focusing anything except https://collaborator.com.
    const captureHandle = track.getCaptureHandle();
    return ShouldFocusOrigin(captureHandle && captureHandle.origin);
  } else {  // Assume Capture Handle is not a thing.
    // Assume the application is only interested in focusing tabs, not windows.
    return track.getSettings().displaySurface == 'browser';
  }
}

Security Concerns

One noteworthy security concerns is that allowing switching focus at an arbitrary moment could allow clickjacking attacks. The suggested spec addresses this concern by limiting the time when focus-switching may be triggered/suppressed - the application may only decide about focus immediately[*] upon the resolution of the Promise<MediaStream>. (See the spec-draft for more details about what "immediately" means and how I suggest various edge-cases be handled.)

Sep 06 '21 22:09 eladalon1983

This looks amazing, been waiting for this for a long time! In tella.tv we have the following flow:

User goes to recorder
User selects a screen/window/tab they want to record
When picking window or tab, they get jumped to the window or tab, but need to go back to tella first to start the recording.
Start recording in tella
Countdown
Manually go to the tab/window they were recording.

If i'm looking at this proposal right, our new flow could be:

User goes to recorder
User selects a screen/window/tab they want to record (and does not get jumped to the window or tab)
User clicks start recording
Countdown
We fire .focus on the stream to bring them where they need to be automatically.

A much nicer flow. Would love to be in the Origin Trial when it's available :)

Sep 07 '21 08:09 happylinks

Ah, reading the spec again, missed this point: If this call to focus() happens more than one second after the start of the capture, the user agent SHOULD have already made its own decision. The user agent SHOULD silently ignore this call focus().

That means we would not be able to do "We fire .focus on the stream to bring them where they need to be automatically." after countdown. Still better to keep them in Tella though! What is the reason for the 1 second if I may ask ?

Sep 07 '21 08:09 happylinks

What is the reason for the 1 second if I may ask ?

Please note that 1s is the hard-limit, and normally the window of opportunity to calling focus() would close even earlier - as soon as the microtask on which the Promise was resolved is completed. (The 1s hard-limit prevents the application from keeping that microtask running for N seconds and then switching focus.)

The rationale for these limitations is preventing attacks where the capturing application induces the user to click at it chosen location in the captured application by switching focus at just the right time. As a somewhat contrived example, imagine the capturing application presenting at just the right location a button labelled "double-click here to confirm" and then switching focus after it registers the first click.

Are these concerns realistic? I'm not yet sure, but it should be easier to reach consensus if we err on the side of caution.

Sep 07 '21 11:09 eladalon1983

Thanks for explaining! Preventing the focus will already help us a lot.

Sep 07 '21 11:09 happylinks

@jan-ivar and @youennf, I'd love to carry on from where we stopped at the end of the WebRTC WG Interim meeting yesterday.

@youennf IIUC, your main concern was around the 1s hard-limit, but I think that you later mentioned this was due to a misunderstanding? Could you PTAL at my proposed spec and let me know what you think? Please see this part:

If focus() was not called on MT, raise an InvalidStateError. Otherwise, proceed.

If this call to focus() happens more than one second after the start of the capture, the user agent SHOULD have already made its own decision. The user agent SHOULD silently ignore this call focus().

Please note that the intention here is that if focus() is not called by the time MT terminates, then the browser immediately switches focus - without a 1s delay. The 1s delay is a backup in case MT takes an inordinately long time to finish.

@jan-ivar

IIUC, your concerns where over subclassing MediaStreamTrack vs. expsosing an API as navigaotr.mediaDevices.focus. Could you please elaborate on why you think one is superior to the other?
I believe you also suggested scheduling a task to close the focus-window-of-opportunity. My concern here is that tasks are more likely than microtasks to experience long delays, and the hard-limit of 1s would be encountered more often than we'd want. (I think we'd want it as a security restriction that's hardly ever encountered by a non-malicious app.) Could you please elaborate your thoughts on this topic?

Please let me know if I've forgotten any other open issues.

Sep 21 '21 11:09 eladalon1983

Btw, following the interim it became clearer to me which parts of my proposed spec were in need of clarification. I have made adjustments, mostly around (a) explaining the window of opportunity and (b) providing examples. PTAL.

Sep 21 '21 12:09 eladalon1983

IIUC, your concerns where over subclassing MediaStreamTrack vs. expsosing an API as navigaotr.mediaDevices.focus. Could you please elaborate on why you think one is superior to the other?

A user's focus is a global property, so a per-track API simply isn't necessary, and we can avoid the subclassing headache.

I believe you also suggested scheduling a task to close the focus-window-of-opportunity. My concern here is that tasks are more likely than microtasks to experience long delays, and the hard-limit of 1s would be encountered more often than we'd want. (I think we'd want it as a security restriction that's hardly ever encountered by a non-malicious app.) Could you please elaborate your thoughts on this topic?

I think you're looking at this the wrong way. How does a tighter window of opportunity benefit slow apps?

What matters is the amount of code that runs up to the point of invoking the focus method. In your model, JS gets a whole second as long as it stays synchronous, but a single microtask switch (e.g. async undefined) and it's out.

That seems too harsh to me, making things harder on apps, not easier. It also seems biased against apps that use promises.

Microtask-switching is sometimes unavoidable when writing promise code (and arguably why microtasks exist — their whole point is that they're not tasks, and operate within a task). For example:

async function getUserMedia(constraints) {
  const stream = await navigator.mediaDevices.getUserMedia(constraints);
  // store track ids in localStorage
  return stream;
}

This introduces a microtask switch on code calling this function as a drop-in for the real thing — but it's on the same task.🤔

From a security perspective, tasks and microtasks are equally vulnerable to busy-looping.

Sep 22 '21 23:09 jan-ivar

@alvestrand asked me to outline the API shape I propose. I'd prefer:

console.log(navigator.mediaDevices.displayMediaFocusMode); // "focus"

const stream = await navigator.mediaDevices.getDisplayMedia();
navigator.mediaDevices.displayMediaFocusMode = "no-focus"; // stream's target won't be focused

It's a simple read-write enum attribute with two values. It never changes value except when JS changes it.

UA reads it in a task queued from getDisplayMedia's success task, to determine whether to focus what was just resolved.

The app can set it ahead of time as well, if it doesn't care what the user picks:

navigator.mediaDevices.displayMediaFocusMode = "no-focus"; // stream's target won't be focused
const stream = await navigator.mediaDevices.getDisplayMedia();

TL;DR: it affects subsequent calls, but if you set it immediately, it affects what just resolved as well.

Sep 23 '21 00:09 jan-ivar

...navigator.mediaDevices.displayMediaFocusMode = "no-focus";

I don't think this is going to "play nice" if the document is engaged in multiple concurrent captures.

it affects subsequent calls, but if you set it immediately, it affects what just resolved as well

I don't think that's simple, because the association between what gets captured and what gets focused is not there, and concurrent captures become clouded by questions of timing.

A user's focus is a global property, so a per-track API simply isn't necessary, and we can avoid the subclassing headache.

How is navigator.mediaDevices.someFocusApi different than FMST.focus() with respect to "globality"? FMST.focus() allows feature discovery; it'd be impossible for an application to attempt to call it on a non-supporting track. So that's a benefit of this approach. What benefits would the other approach confer?

What matters is the amount of code that runs up to the point of invoking the focus method.

One does not need a lot of code to place a "Click here to win a million dollars" button and then guess how long to wait before switching focus to the other tab and getting the user to click something there. (Moreover, if you use tasks rather than microtasks, I think it's easier to wait for an on-over event, making guesses about when the click is coming easier. But I could be wrong there. Let's call this a secondary argument.)

I don't think we can measure and base a decision off of how much code runs, regardless of whether the window-of-opportunity is closed by a task or a microtask. Either could run an arbitrary amount of code. The user's reaction time, though...

It also seems biased against apps that use promises.

Some of my best friends are apps that use Promises. I have found them to be capable of switching between using Promises and doing things synchronously, as the need arises.

Sep 23 '21 13:09 eladalon1983

I don't think this is going to "play nice" if the document is engaged in multiple concurrent captures.

It should: There's only one user, who can only accept one prompt at a time. The association is with the prompt — the user interaction — not the track.

Example: Imagine a hypothetical navigator.mediaDevices.displayMediaFocus(track) method that must be called between a pair of tasks queued sequentially by the UA, where the first resolves with a new screen-sharing track: Context tells us what to focus, and the track argument is entirely redundant.

FMST.focus() allows feature discovery

This should suffice:

if ('displayMediaFocusMode' in navigator.mediaDevices) { /* UA feature detected */ }

All browsers today automatically focus all focusable surfaces ("monitor" isn't focusable), so the default is "focus".

I'd recommend we strongly encourage UAs to allow apps to override this automatic focus for all focusable surfaces.

This removes any need for per-surface-type detection of policies.

One does not need a lot of code to place a "Click here to win a million dollars" button and then guess how long to wait before switching focus to the other tab and getting the user to click something there. (Moreover, if you use tasks rather than microtasks, I think it's easier to wait for an on-over event, making guesses about when the click is coming easier.

We discussed offline that gDM can queue two tasks A and B, guaranteeing no other task runs in between, so events aren't an issue. The promise is resolved in A, creating microtask A₁ where JS runs. Your proposal is to limit calls to between A₁-A₂, whereas I'm saying A₁-B is fine and superior as it avoids tripping up promise-use. The 1 second timeout is orthogonal.

Sep 23 '21 18:09 jan-ivar

I also prefer an attribute over an explicit call for each track. This allows an application to set its default value easily (never focus for instance) while providing enough flexibility for web applications that want per-capture decisions.

As of event task vs. micro task, there are preexisting examples. For instance, the fetch event is fired and some properties are read synchronously after the fetch event listeners, thus before the microtasks that would have been scheduled during the fetch event listeners.

The main advantage I see is that it is a clear web developer contract: the boundary to call focus is the next await. I believe this approach also allows shiming since the read-focus callback would be executed after the already registered promise resolution callback.

From a pure spec/developer point of view, it should be also easier to specify and implement: queue a task where the promise is resolved and a promise resolution callback to read the value and apply focus is registered. If we would like to leave the whole task to change the focus value, we might actually not want to read the value in the next task but at the point in the current task where the microtask queue gets emptied after the task steps are executed. I am not sure how doable that is right now in terms of spec editing.

Sep 24 '21 13:09 youennf

I also prefer an attribute over an explicit call for each track.

👍

As of event task vs. micro task, there are preexisting examples. For instance, the fetch event is fired and some properties are read synchronously after the fetch event listeners, thus before the microtasks that would have been scheduled during the fetch event listeners.

Event listeners must respond synchronously to prevent bubbling/avoid default for historical reasons.

Those "properties" appear to be fetch trying to support promise code with event.respondWith(async () => {}). So my takeaway from fetch is we should make affordances for promise use. ("[In fetch] we use promises heavily").

I believe this approach also allows shiming since the read-focus callback would be executed after the already registered promise resolution callback.

I'll try to explain the shimming problem in more detail: Say a JS library (e.g. adapter.js) needs to shim gDM for some reason:

const nativeGDM =  window.navigator.mediaDevices.getDisplayMedia;

window.navigator.mediaDevices.getDisplayMedia = async function getDisplayMedia(constraints) {
  // Here we're on Task A
  const stream = await nativeGDM.apply(this, arguments);
  // Here we're on Task B, Microtask B₁
  stream.newFeatureX = 3;
  return stream; // the implicit promise returned by this async function is resolved with stream
}

The shim is careful not to await anything else, yet a microtask checkpoint is unavoidable, because the promise returned by the shim is not the same as the one from nativeGDM. Thus the JS library will unintentionally break all existing code attempting to suppress focus. E.g.:

// Here we're on Task A
const stream = await navigator.mediaDevices.getDisplayMedia({video: true});
// Here we'd be on Task B, Microtask B₁ without the shim, but with the shim we're on Microtask B₂ instead
navigator.mediaDevices.displayMediaFocusMode = "no-focus"; // focuses with the shim but not without!

Sep 24 '21 18:09 jan-ivar

Global or per-surface controls

(@youennf:) I also prefer an attribute over an explicit call for each track.

These are two distinct preferences:

Global or per-surface. (Discussed in this section.)
Attribute or method. (Discussed in the next section.)

(@jan-ivar:) There's only one user, who can only accept one prompt at a time.

The browser can operate in modes which skip the prompt. Mechanisms to trigger these include extensions, enterprise policies and command-line arguments. The spec may be agnostic of these, but the fact remains that the user can accept multiple different captures within a very short span of time.

At any rate, if the application fires off two calls to getDisplayMedia and wants to focus exactly one of these, then it's a lot more ergonomic to call focus() on the right track, than to to manipulate a global attribute at just the right time, ensuring it's the intended value when the UA reads it for the one display-surface and the other value when the UA reads it for another display-surface. It requires of the Web-developer much more in-depth understanding.

Method vs. Attribute

Assume, for the sake of argument, that my previous section convinced you to use per-surface controls. Do we want a method or attribute then?

An application that can read the value may just as well set its own preferred value. An attribute for focus would make sense if global, but not if per-surface.

However, before even calling getDisplayMedia, the application might already wish to know whether it can influence the decision. I would not object to adding a global attribute that reads (and potentially writes?) the default behavior - that which applies if the per-surface API is not invoked. The presence of this global attribute also informs the application that the per-surface control will be exposed if the user chooses to share a focusable surface.

Btw, one challenge with writable attributes is that Web-developers would not as readily expect setting of attributes to potentially raise an exception.

Subclassing MediaStreamTrack

I think we have seen multiple cases where subclassing MediaStreamTrack would have confered benefits, but each time a discussion arose over whether it's enough to sub-class just for that. The results of having everything on MediaStreamTrack is sub-optimal. Some immediate beneficiary APIs of a decision to sub-class would be:

The focus API - only focusable surfaces (browser-surfaces and windows; in the future potentially applications and/or isolated-browser-surfaces).
Capture Handle - only browser-surfaces (maybe more in the future).
Cropping - only self-capture (SelfCaptureMST -> BrowserSurfaceMST -> FocusableMST -> DisplayCaptureMST -> MST).

IMHO, this list is sufficiently long and the benefits are sufficient. When someone calls getViewportMedia(), the result is inherently different than when someone calls getUserMedia(), and it makes sense for the APIs exposed to reflect that.

Tasks vs. Microtasks

IIANM, the only argument for tasks is that they are shim-friendly. (Please correct me if I'm wrong.)

An argument against tasks is that in addition to shimming, it allows an application to await somePromise. This is an anti-pattern, as the results would be flaky. If somePromise is already resolved, it would work; if it will only by resolved by a later task, it would not work.

This trade-off is easy to reason about (IMHO) because we can have both. If we use microtasks, shimming is possible with an adapter:

const nativeGDM =  window.navigator.mediaDevices.getDisplayMedia;

function focusCallback(stream) {
  // Return "no-focus-change" or "focus-captured-surface"
}

window.navigator.mediaDevices.getDisplayMedia = async function getDisplayMedia(constraints, focusCallback) {
  const stream = await nativeGDM.apply(this, arguments);
  const [track] = stream.getVideoTracks();
  if (!!focusCallback && !!track.focus) {
    const shouldFocus = focusCallback(stream);
    track.focus(shouldFocus);
  }
  return stream;
}

The code outside the shim just plugs their callback. Note that there are natural limits on what the app can do anyway until the window-of-opportunity closes, so I expect the code would easily and naturally fit inside of a synchronous callback.

Sep 27 '21 10:09 eladalon1983

The shim is careful not to await anything else, yet a microtask checkpoint is unavoidable

It is avoidable by having the shim returning the native promise returned by getDisplayMedia. The only downside is that you cannot change the object being resolved, but you can update the object itself: add tracks, remove tracks...

Looking at the HTML event loop, we can also see that after the micro task checkpoint, there are additional tasks that need to be done, some of which might trigger firing events.

These are two distinct preferences:

Global or per-surface. (Discussed in this section.)

Attribute or method. (Discussed in the next section.)

I was meaning a global attribute, somewhere in navigator. With the current focus API design, in particular the restriction to read the focus-or-not value synchronously in the task resolving the promise, I do not see a need to have a per-track method.

Let's say the application wants a behavior for all getDisplayMedia calls. It will set the attribute once. Let's say the application wants to select the behavior for each getDisplayMedia call, it will update the attribute value each time within the promise resolution callback.

A method is less intuitive to me.

one challenge with writable attributes is that Web-developers would not as readily expect setting of attributes to potentially raise an exception.

I do not think the plan for setting this attribute is to raise an exception.

Sep 27 '21 13:09 youennf

Let's say the application wants to select the behavior for each getDisplayMedia call, it will update the attribute value each time within the promise resolution callback.

A Web-developer would need non-trivial understanding of the feature in order to become convinced that there'd be no carry-over effect of focusing the wrong surface. They might explain it to themselves as "only the last capture gets focus or not" - but then, why does it sometimes not seem to work? There is no exception raised when they set the new value too late (e.g. await on an yet-unresolved-Promise). With a method, there would be, and that exception would not be unexpected.

Sep 27 '21 13:09 eladalon1983

It is avoidable by having the shim returning the native promise returned by getDisplayMedia. The only downside is that you cannot change the object being resolved, but you can update the object itself: add tracks, remove tracks...

@youennf How? Please be specific.

one challenge with writable attributes is that Web-developers would not as readily expect setting of attributes to potentially raise an exception.

I do not think the plan for setting this attribute is to raise an exception.

Agreed (not beyond WebIDL's TypeError for invalid enum values).

Sep 27 '21 16:09 jan-ivar

I do not think the plan for setting this attribute is to raise an exception.

Agreed (not beyond WebIDL's TypeError for invalid enum values).

Contrast:

1.

const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await someOtherPromisesThatResolvesMuchLater;
track.focus("no-focus-change");

2.

const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await someOtherPromisesThatResolvesMuchLater;
navigator.mediaDevices.focusPolicy = "no-focus-change";

3.

const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
await immediatelyResolvedPromise;
navigator.mediaDevices.focusPolicy = "no-focus-change";
const otherStream = await navigator.mediaDevices.getDisplayMedia();

The first can throw an exception that explains to Web-developers why what they're doing is not having the desired effect.
The second has to accommodate the option that the Web-developer actually intended to set the new policy, going forward, completely independently of the call to gDM that has already resolved. The UA cannot raise an exception, because no error was made - but we human beings can tell there's a good chance it's a bug in the Web-application.
The third affects the gDM call that just resolved. Maybe it's what the developer wanted; or maybe they actually intended to only affect otherStream, and this is actually the inverse bug of the second example. Who knows?!

Only the first option produces consistent results (and raises a clear exception when used incorrectly).

Sep 27 '21 17:09 eladalon1983

The browser can operate in modes which skip the prompt.

That is out of scope for this working group.

The spec may be agnostic of these, but the fact remains that the user can accept multiple different captures within a very short span of time.

@eladalon1983 Not a problem. JS is single-threaded, and getDisplayMedia implicitly queues a task to resolve a promise from in parallel steps. My proposal would be to queue two tasks, guaranteed to happen in succession. There is no "span of time" short enough for two getDisplayMedia calls resolving to be a problem.

it's a lot more ergonomic to call focus() on the right track, than to to manipulate a global attribute at just the right time,

You have to call focus() at just the right time as well. Also, if ergonomics is the issue, why isn't it stream.focus()?

Sep 27 '21 17:09 jan-ivar

The browser can operate in modes which skip the prompt.

That is out of scope for this working group.

I have foreseen this response and added the text which you quoted immediately below (agnosticism etc.). The topic is not whether we can handle it spec-wise, but rather whether it produces ergonomic results for the application. Consider:

const stream1 = await navigator.mediaDevices.getDisplayMedia();
doSomething();  // Maybe ends the task, maybe doesn't.
navigator.mediaDevices.focusPolicy = "no-focus"; // Who is affected? s1? s2? Both? Neither?
doSomethingElse();  // Maybe ends the task, maybe doesn't.
const stream2 = await navigator.mediaDevices.getDisplayMedia();

The global-attribute API produces code which is hard to reason about. Does it affect stream1? Does it affect stream2? Both? Neither? If flaky for either - what is the bug? It's definitely not simple.

Not a problem. JS is single-threaded,

My assertion is not that it's impossible to spec this properly. I argue that the result is not ergonomic and not simple.

Also, if ergonomics is the issue, why isn't it stream.focus()?

Well, if you think ergonomics would be improved by moving focus() to MediaStream, we can discuss. My motivation for not putting it there was that MediaStreams can get tracks added/removed, and then focus() becomes... un-simple.

Sep 27 '21 17:09 eladalon1983

@youennf How? Please be specific.

const nativeGDM =  window.navigator.mediaDevices.getDisplayMedia;
window.navigator.mediaDevices.getDisplayMedia = function getDisplayMedia(constraints) {
  const promise = nativeGDM.apply(this, arguments);
  promise.then(stream => {
    stream.newFeatureX = 3;
  });
  return promise;
}

Sep 27 '21 17:09 youennf

Only the first option produces consistent results (and raises a clear exception when used incorrectly).

Of the three listed, which is a false choice. We don't need to subclass anything to throw. See below:

I would not object to adding a global attribute that reads (and potentially writes?) the default behavior ...

So you'd introduce an attribute and a method?

Contrast:

navigator.mediaDevices.defaultFocusPolicy = "focus-change";
const stream = await navigator.mediaDevices.getDisplayMedia();
await someOtherPromisesThatResolvesMuchLater;
navigator.mediaDevices.focus("no-focus-change"); // throws InvalidStateError, focus already happened

vs.

navigator.mediaDevices.focusPolicy = "focus-change";
const stream = await navigator.mediaDevices.getDisplayMedia();
await someOtherPromisesThatResolvesMuchLater;
navigator.mediaDevices.focusPolicy = "no-focus-change"; // Doesn't throw, focus already happened

Sep 27 '21 17:09 jan-ivar

const nativeGDM =  window.navigator.mediaDevices.getDisplayMedia;
window.navigator.mediaDevices.getDisplayMedia = function getDisplayMedia(constraints) {
  const promise = nativeGDM.apply(this, arguments);
  promise.then(stream => {
    stream.newFeatureX = 3;
  });
  return promise;
}

@youennf Sorry, but each then on the same promise introduces a microtask checkpoint.

Sep 27 '21 17:09 jan-ivar

So you'd introduce an attribute and a method?

No. We're not on the same page. Maybe a quick summary of my position would help. I am only interested in a per-track method, but I am analyzing all options in my attempt to convince you that the option I favor is superior.

When I discuss inability to raise an exception, I do not mean it's because the attribute is an attribute. Rather, the problem arises because that global-attribute API is always valid to manipulate. Namely, an application could always be setting the value for the next gDM to follow.

I claim that inability to reliably raise an exception is a drawback. It leaves Web-developers in the dark when their applications behave inconsistently. It fails silently. That's a problem.

Sep 27 '21 17:09 eladalon1983

I am only interested in a per-track method,

Why, when it is less ergonomic than stream.focus()?

Rather, the problem arises because that global-attribute API is always valid to manipulate.

I showed an option 1 above that would throw outside the same envelope, to show there's no need to subclass MST here.

From my end, the conversation has led to the options I show (1 and 2) which I think capture the remaining open question:

Would we rather offer JS a default value, or a clear exception when JS misses an (obvious) time window, or both (at the cost of double the API surface)?

Sep 27 '21 17:09 jan-ivar

I am only interested in a per-track method,

Why, when it is less ergonomic than stream.focus()?

I have answered that question. Please see the bottom of my comment there.
Citation needed for one being more ergonomic than the other. (One being less code is not such a significant change in ergonomics. The ergonomics I refer to when I propose MST.focus are clarity, not brevity.)
I remain (cf. comment) open to discussing a per-stream method . The context in which I mentioned "only per-track" was when you asked if I was suggesting "an attribute and a method". It was not a blanket "only" for anything one could conceive of. It was a clarification that I was not at the time proposing two APIs, as you seemed to think.

I showed an option 1 above that would throw outside the same envelope, to show there's no need to subclass MST here.

From my end, the conversation has led to the options I show (1 and 2) which I think capture the remaining open question:

The original motivation I heard for using a global attribute, was that it allows to:

Probe the UA's capability before calling gDM.
Set a global default that would apply when gDM is called in the future.

So:

I agree that (1) is achieved by the global attribute.
I claimed that (2) loses the ability to raise instructive exceptions.

Do you now propose a version of the global that does raise exceptions? If so - that loses benefit (2). The UA cannot simultaneously, with a single API, allow the Web-developer to influence the future as well as warn them when they're too late to influence the past.

So my question is - which API are you now proposing, and what are its benefits over MST.focus?

Sep 27 '21 18:09 eladalon1983

doSomething();  // Maybe ends the task, mabye doesn't.

Synchronous functions don't end a task, because JS is run to completion. Did you mean await doSomething() ?

The global-attribute API produces code which is hard to reason about. Does it affect stream1? Does it affect stream2? Both? Neither? If flaky for either - what is the bug? It's definitely not simple.

I claim it is super simple, outside of contrived examples. Here's all users need to know:

Await navigator.mediaDevices.getDisplayMedia()
Set navigator.mediaDevices.focusPolicy before you do anything else.

All promise composition is allowed, including shims, try/catch(), breaking things into async sub-functions, Promise.resolve(p) and Promise.race(p, timeout). This I claim is POLA.

If for some reason, you wish to explicitly not affect the just resolved method, then await new Promise(r => setTimeout(r))

Sep 27 '21 19:09 jan-ivar

Given how simple this is, I personally wouldn't bother with an exception.

Sep 27 '21 19:09 jan-ivar

All promise composition is allowed, including shims, try/catch(), breaking things into async sub-functions, Promise.resolve(p) and Promise.race(p, timeout). This I claim is POLA.

In contrast, here's a list of benign things that'd cause action-at-a-distance focus-failure if we go with a microtask cutoff:

getDisplayMedia shimming with any synchronous post-success code at all
Promise-branching like @youennf's getDisplayMedia shimming workaround won't work either
Putting gDM into a sub function with any synchronous post-success code at all

Try-catch in a sub-function or shim

async function getDisplayMedia(c) {
  try {
   await navigator.mediaDevices.getDisplayMedia(c); // 💥 microtask checkpoint
 } catch (e) {
   log.addTo(e); // log all gDM errors
   throw e;
 }
}

Promise.resolve()

function foo(mediaDevices) {
  return Promise.resolve(mediaDevices?.getDisplayMedia(c));  // 💥 microtask checkpoint
}

Application timeout

const wait = ms => new Promise(r => setTimeout(r, ms));

function timeout(p, ms) {
  return Promise.race([p, wait(ms)]);  // 💥 microtask checkpoint
}
await timeout(navigator.mediaDevices.getDisplayMedia(c), 10000); // 10s timeout

These are all things people expect to work when writing promise code, and there's no reason it should break some new API.

Sep 27 '21 20:09 jan-ivar

@youennf Sorry, but each then on the same promise introduces a microtask checkpoint.

Each then returns a new promise, but this is fine. The idea is for the microtask used by User Agent to check for focus-or-not to be executed within the same event loop task but after the JS shim getDisplayMedia promise resolution callbacks. I think this is doable with the updated shim I provided, can you explain why it would not work? I tried to express my understanding/POC in https://jsfiddle.net/f0uqx8jg/.

Sep 28 '21 09:09 youennf

stream.focus() is a red herring. A stream is not a reference to a focusable object; the fact that a stream returned from GetDisplayMedia is guaranteed to have just one video track is not a property of MediaStream.

Sep 28 '21 09:09 alvestrand

mediacapture-screen-share
mediacapture-screen-share copied to clipboard

Conditional Focus (When Display-Capture Starts)

Problem

Sample Use Case

Suggested Solution and Demo

Sample Code

Security Concerns

Global or per-surface controls

Method vs. Attribute

Subclassing MediaStreamTrack

Tasks vs. Microtasks

1.

2.

3.

mediacapture-screen-share mediacapture-screen-share copied to clipboard

Conditional Focus (When Display-Capture Starts)

Problem

Sample Use Case

Suggested Solution and Demo

Sample Code

Security Concerns

Global or per-surface controls

Method vs. Attribute

Subclassing MediaStreamTrack

Tasks vs. Microtasks

1.

2.

3.

mediacapture-screen-share
mediacapture-screen-share copied to clipboard