proposals Multi-capture

Multi-capture

Open eladalon1983 opened this issue 2 years ago • 3 comments

Introduction

Some applications wish to concurrently capture multiple surfaces.

Capturing multiple surfaces is doable using existing APIs - it is possible to call getDisplayMedia() multiple times. However, this is not very ergonomic, and creates serious friction for the user:

The user has to interact with the browser's media-picker multiple times.
The user has to interact with the application multiple times, signaling that they want to capture yet another surface, and providing a new transient activation each time.
The user is liable to make mistakes when trying to remember which surfaces they've already started capturing, and which surfaces remain for them to capture.

Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality akin to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of e surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.

Illustration: mock

Use Cases

Use-case 1: Streamers presenting multiple surfaces (dynamic receivers)

Consider an instructor presenting multiple tabs to several students.

Instructor streams multiple tabs to an SFU.
Individual students independently choose tab to view at any given moment.

With a single click, the instructor can start capturing all the relevant tabs.

Use-case 2: Streamers presenting changing surface (dynamic sender)

Video conferencing software asks the user to choose all the tabs the user wishes to share. The application captures all surfaces, but, at any given moment, it only relays to the SFU a single tab. Which tab is relayed depends on app-specific logic. (For instance, maybe only the last-active tab.)

Use-case 3: Record N screens/windows/tab

Recording for compliance/training/billing reasons.

Use-case 4: Record and compose

Record multiple windows. Redraw them to a canvas to produce a video of a virtual desktop. This virtual desktop only has the captured windows, which improves privacy a lot over what users currently do nowadays - sharing the entire (real) desktop so as to share a handful of windows. (Additional, orthogonal API for learning the position and size of windows needed to make this truly powerful.)

Goals

Provide an API which allows multiple screen-captures to be initiated. It should only require a single transient activation. Ideally, the user agent should present to the user a UX which would render certain user-mistakes impossible (e.g. capturing the same surface multiple times).

Proposed Solution

Possible API 1: New method (getDisplayMediaSet)

partial interface MediaDevices {
  Promise<sequence<MediaStream>> getDisplayMediaSet(
    optional MediaStreamConstraints constraints = {});
}

Possible API 2: Overloaded return type for getDisplayMedia

Add a possible paramter to getDisplayMedia called maxSurfaces. Its default value is 1. With that value, the existing behavior is manifested. For values greather than 1, the new behavior is manifested (multi-picker), and the return type changes to Promise<sequence<MediaStream>>.

Examples

See mock.

Let’s Discuss

Which of the APIs proposed above is preferable? (Or anything else...?)
Any unforeseen issues with allowing multiple display surfaces of different types?
Audio - global or per-surface?

Feb 28 '22 20:02 eladalon1983

Great idea! At Tella we've had multiple users ask for the ability to record multiple windows at the same time, without sharing their full screen. We haven't implemented this yet, partly because like you said in the original post, the UX currently is not ideal for a user. Picking a screen is already a hard task for a lot of users (giving OS permissions, knowing the implications of picking a screen (mirror effect), etc) so we didn't want to make it more complex with multiple prompts. However, if they would be able to just multiselect windows this would make the experience a lot better.

We would indeed also want a way to make sure they don't select too many streams (like with maxSurfaces in your example), since recording a lot of streams has a performance impact.

Partly related, it would be great if we could say they can only capture windows, but I know there’s already a discussion about that here: https://github.com/w3c/mediacapture-screen-share/issues/184.

So summarizing: I like the idea of allowing selection of multiple windows/tabs/screens and I think it will improve the UX for the screen picker and will make it nicer to implement recording/streaming apps.

Also one advantage I can see over prompting multiple times; we don't know at the start how many streams they want to share. "Add another stream/window" is something that could be added in our own UI but could also be more confusing to the user than handling it in context; the screen picker.

Feb 28 '22 20:02 happylinks

This would be awesome. At Appblit, we develop a safer screen sharing experience by letting users share several windows over a clean virtual desktop image (so they don't have to show their entire desktop, or only share one window at a time). We have now a native-only app Screegle but our users want a web-based solution for broader adoption without having to install a native app (which some of our potential customers cannot do for perceived security issues in internal policies).

As detailed for use-case #4, it is cognitively hard for users to remember which window was already chosen. It's also hard to know which Chrome UI to click on to stop sharing one (which isn't usually an issue if they only share one at a time, but if they shared several, they just can't find which button corresponds to which).

The suggested picker would solve both issues in our opinion.

Mar 01 '22 09:03 ldenoue

https://github.com/WICG/multicapture is now live! Happy incubation!! :)

Mar 02 '22 13:03 yoavweiss

proposals proposals copied to clipboard

Multi-capture

Introduction

Use Cases

Use-case 1: Streamers presenting multiple surfaces (dynamic receivers)

Use-case 2: Streamers presenting changing surface (dynamic sender)

Use-case 3: Record N screens/windows/tab

Use-case 4: Record and compose

Goals

Proposed Solution

Possible API 1: New method (getDisplayMediaSet)

Possible API 2: Overloaded return type for getDisplayMedia

Examples

Let’s Discuss

proposals
proposals copied to clipboard