Summary

Pre-Summary: Status

There is a detailed spec draft, and Chrome is implementing this for an origin trial.

Problem Overview

Recall that applications may currently obtain a capture of the tab in which they run using getDisplayMedia, either with or without preferCurrentTab. Moreover, soon another API will allow similar functionality - getViewportMedia. In either case, the application may then also wish to crop the resulting video track so as to remove some content from it (typically before sharing it remotely). We introduce a performant and robust API for cropping a self-capture video track.

Core Challenges

Layout can change asynchronously when the user scrolls, zooms or resizes the window. The application cannot robustly react to such changes without risking mis-cropping the video track on occasion. The browser therefore needs to step in and help.

Sample Use Case

Consider a combo-application consisting of two major parts - a video-conferencing application and a productivity-suite application co-existing in a single tab. Assume the video-conferencing uses existing/upcoming APIs such as getDisplayMedia and/or getViewportMedia and captures the entire tab. Now it needs to crop away everything other than a particular section of the productivity-suite. It needs to crop away its own video-conferencing content, any speaker notes and other private and/or irrelevant content in the productivity-suite, before transmitting the resulting cropped video remotely.

Moreover, consider that it is likely that the two collaborating applications are cross-origin from each other. They can post messages, but all communication is asynchronous, and it's easier and more performant if information is transmitted sparingly between them. That precludes solutions involving posting of entire frames, as well as solutions which are too slow to react to changes in layout (e.g. scrolling, zooming and window-size changes).

Goals and Non-Goals

Goals

The new API we introduce allows an application which is already in possession of a self-capture video track, to crop that track to the contours of its desired element.
The API allows this to be done performantly, consistently and robustly.

Non-Goals

This API does not introduce new ways to obtain a self-capture video track.
This API does not introduce mechanisms by which a captured document may control what the capturing document can see.

Solution

Solution Overview

A two-pronged solution is presented:

Crop-ID production: A mechanism for tagging an HTMLElement as a potential target for the cropping mechanism.
Cropping mechanism: A mechanism for instructing the user agent to start cropping a video track to the contours of a previously tagged HTMLElement, or to stop such cropping and revert a track to its uncropped state.

Crop-ID production

We introduce navigator.mediaDevices.produceCropId().

MediaDevices {
  Promise<DOMString>
  produceCropId((HTMLDivElement or HTMLIFrameElement) target);
};

Given an HTMLElement, produceCropId() produces a UUID that can uniquely identify that element to our second mechanism - the cropping mechanism. (The Promise returned by produceCropId() is only resolved when the ID is ready for use, allowing the browser time to set up prerequisites and propagate state cross-process.)

Cropping mechanism

We introduce a cropTo() method, which we expose on all video tracks derived of tab-capture.

[Exposed = Window]
interface BrowserCaptureMediaStreamTrack : FocusableMediaStreamTrack {
  Promise<undefined> cropTo(DOMString cropTarget);
};

Given a UUID, cropTo() starts cropping the video track to the contours of the referenced HTMLElement. Given an empty string, cropTo() reverts a video track to its uncropped state. "On-the-fly" changing of crop-targets is possible.

Code Samples

/////////////////////////////////
// Code in the capture-target: //
/////////////////////////////////

const mainContentArea = navigator.getElementById('mainContentArea');
const cropId = await navigator.mediaDevices.produceCropId(mainContentArea);
sendCropId(cropId);

function sendCropId(cropId) {
  // Can send the crop-ID to another document in this browsing context
  // using postMessage() or using any other means.
  // Possibly there is no other document, and this is just consumed locally.
}

/////////////////////////////////////
// Code in the capturing-document: //
/////////////////////////////////////

async function startCroppedCapture(cropId) {
  const stream = await navigator.mediaDevices.getDisplayMedia();
  const [track] = stream.getVideoTracks();
  if (!!track.cropTo) {
    handleError(stream);
    return;
  }
  await track.cropTo(cropId);
  transmitVideoRemotely(track);
}

Spec draft

Please take a look at the proposed spec. (Easily missed, so repeated.)

Oct 05 '21 20:10 eladalon1983

Given an empty string, cropTo() reverts a video track to its uncropped state.

Passing an empty string to a function to make it release a crop feels dirty.

Can we not have a removeCrop() function on the track? or have cropTo return a promise function in order to uncrop when called - thats already a pretty established pattern in other places.

I'd still love to see an extra constraint to getDisplayMedia() in order for the browser to do the cropping before it gets to javascript land - this still doesn't solve the problem of the underlying application having access to an entire tab (in this case) - but thats completely separate to this.

Oct 06 '21 15:10 danjenkins

Can we not have a removeCrop() function on the track?

Since it'd be functionally equivalent, I'd not object. But my own subjective preference is to have a single method here. Can we perhaps find an objective measure to determine the better approach?

or have cropTo return a promise function in order to uncrop when called - thats already a pretty established pattern in other places.

The API currently allows seamless transition from one crop-target to another. What happens in that case?

const uncropCallback1 = track.cropTo(cropId1);
const uncropCallback12 = track.cropTo(cropId2);
uncropCallback1();

What do you suggest we do in this case? Uncrop the track? No-op? Raise an exception?

I'd still love to see an extra constraint to getDisplayMedia() in order for the browser to do the cropping before it gets to javascript land - this still doesn't solve the problem of the underlying application having access to an entire tab (in this case) - but thats completely separate to this.

I agree that it's separate. Consider also the complicating factor that one document can draw on top of another - and cropping catches that. What you'd want here is element-level capture, with a div/iframe capturing itself, without capturing occluding content. It's a useful API that's under discussion. My opinion is that such an API serves different needs than cropping, and the Web needs both.

Oct 06 '21 16:10 eladalon1983

The API currently allows seamless transition from one crop-target to another. What happens in that case?
const uncropCallback1 = track.cropTo(cropId1);
const uncropCallback12 = track.cropTo(cropId2);
uncropCallback1();
What do you suggest we do in this case? Uncrop the track? No-op? Raise an exception?

I'd expect uncropCallback1 to become invalid/cancelled once uncropCallback12 was made... so I would expect calling uncropCallback1() to throw an error as you're trying to uncrop something thats no longer valid.

Oct 06 '21 16:10 danjenkins

You've mentioned precedents for this pattern. Could you please specify one or two precedents, so that I might examine the reasoning that led to that pattern being adopted there, and see if the rationale applies here too?

Oct 06 '21 21:10 eladalon1983

I had react hooks in my head but now I can't find an example... but they definitely do exist... I've used them; a function in this case cropTo returning a pre bound function to cancel it. Kinda like setTimeout returning a ref to the timeout.... that you then cancel with clearTimeout... why not just return a pre-bound clearTimeout function bound with the ID. I'll find an example tomorrow

Oct 06 '21 22:10 danjenkins

I'd second Dan here on empty string.

The API currently allows seamless transition from one crop-target to another. What happens in that case?

Another example is if you just call the uncropper more than once.

What do you suggest we do in this case? Uncrop the track? No-op? Raise an exception?

No-op seems a good choice. I think the resolve/reject functions of Promise are quite comparable to this: The first call wins, any subsequent calls are ignored and both are tied to the same thing.

What Dan is presumably referring to is this quite common pattern:

function repeat(callback, intervalMs) {
    const id = setInterval(callback, intervalMs);
    return () => clearInterval(id);
}

As one can see, calling this twice would also be a no-op.

Personally, I don't mind cropTo()/cropTo(undefined) or cropTo(null) or an explicit method to uncrop. Just not empty string. :slightly_smiling_face:

Oct 07 '21 07:10 lgrahl

I believe setInterval and setTimeout return a handle[*], and clearInterval accepts that handle, for a practical reason - there'd otherwise be no way to know which interval to clear. This is not a limitation of the current cropping API design, which exposes cropTo on track, so to employ the same[*].

Please examine is returned by cropTo in my current proposal - a Promise which resolves after the browser can guarantee to the application that all of the next frames produced will be cropped/uncropped according to the application's latest request. I believe that's vital. Without it, the application would be consuming a few frames that are mis-cropped, and have no way of knowing how long to wait until the new crop is applied.

For the issue of ''/null/undefined, I am fine with any/all of these. Maybe anything that evaluates to false?

-- [*] This example returns a handle, not callback, btw.[**] [**] And returning a callback which itself returns a Promise is not very elegant in comparison with the current solution, IMHO.

Oct 07 '21 09:10 eladalon1983

There are benefits to doing cropTo(null) in the fact you'd not need to keep track of that returned function - as long as you had access to the track you'd be able to clear the crop. In that sense I'd expect cropTo() to be a throwable thing and cropTo(null) to clear the existing crop on a track if it exists and a noop if one doesn't exist.

Oct 07 '21 09:10 danjenkins

There are benefits to doing cropTo(null) in the fact you'd not need to keep track of that returned function - as long as you had access to the track you'd be able to clear the crop. In that sense I'd expect cropTo() to be a throwable thing and cropTo(null) to clear the existing crop on a track if it exists and a noop if one doesn't exist.

Then you'll be very pleased by the spec draft, which specifies exactly that, modulo different between null and ''.

Oct 07 '21 09:10 eladalon1983

Then you'll be very pleased by the spec draft, which specifies exactly that, modulo different between null and ''.

My issue was primarily with an empty string :D No harm in thinking out loud about other APIs :)

Oct 07 '21 09:10 danjenkins

My issue was primarily with an empty string :D No harm in thinking out loud about other APIs :)

Absolutely no harm indeed. I'm very glad to hear your feedback. Please keep it coming!

Oct 07 '21 09:10 eladalon1983

Speaking of feedback, I am wondering - any objections to making a repeated call to cropTo(null)/removeCrop() not throw an error, but rather be a no-op? It simplifies both the implementation in the browser[*] as well as Web-apps.

Oct 08 '21 13:10 eladalon1983

In that sense I'd expect cropTo() to be a throwable thing and cropTo(null) to clear the existing crop on a track if it exists and a noop if one doesn't exist.

That's what I said further up :) cropTo() without a parameter should be throwable as its not a valid use case of the api, but repeated calls to cropTo(null) even if theres no crop to release should be a no-op.

Oct 08 '21 13:10 danjenkins

That's what I said further up :)

I appear to have misread your original message. Specifically, I believe I misread "and a noop if one doesn't" as "and an exception if one doesn't." My apologies. So we're now two votes for no-op.

Oct 08 '21 13:10 eladalon1983

@eladalon1983 I remembered where this cancel a listener idea came from in my mind... firestore from google.

var unsubscribe = db.collection("cities")
    .onSnapshot(() => {
      // Respond to data
      // ...
    });

// Later ...

// Stop listening to changes
unsubscribe();

https://firebase.google.com/docs/firestore/query-data/listen#web-version-8_5

(not saying lets use it... just adding for completeness)

Oct 13 '21 15:10 danjenkins

MediaDevices {
  Promise<DOMString>
  produceCropId((HTMLDivElement or HTMLIFrameElement) target);
};

Any reason why this would be limited to HTMLDivElement or HTMLIFrameElement? What if one wanted to share just a HTMLVideoElement for example? The use case could be to share a video stream via a video call.

Oct 18 '21 10:10 tomayac

Any reason why this would be limited to HTMLDivElement or HTMLIFrameElement? What if one wanted to share just a HTMLVideoElement for example? The use case could be to share a video stream via a video call.

This was an early attempt at a compromise with Mozilla, whom I understood as somewhat open to iframe, but initially skeptical of general elements. My own preference is indeed for any element, and I hope we can come to an agreement over that. Given that any element we wish to share can be "stuffed" into a div, I believe div to be sufficient for the feature to be useful, but it is indeed a somewhat arbitrary restriction, and I hope we can move past it.

Oct 18 '21 10:10 eladalon1983

Should've commented here sooner, but the repo was adopted, then migrated to the W3C, and currently lives under https://github.com/w3c/mediacapture-region/

Sep 26 '22 02:09 yoavweiss

proposals
proposals copied to clipboard

Region Capture: Cropping API for Video Tracks

Summary

Pre-Summary: Status

Problem Overview

Core Challenges

Sample Use Case

Goals and Non-Goals

Goals

Non-Goals

Solution

Solution Overview

Crop-ID production

Cropping mechanism

Code Samples

Spec draft

proposals proposals copied to clipboard

Region Capture: Cropping API for Video Tracks

Summary

Pre-Summary: Status

Problem Overview

Core Challenges

Sample Use Case

Goals and Non-Goals

Goals

Non-Goals

Solution

Solution Overview

Crop-ID production

Cropping mechanism

Code Samples

Spec draft

proposals
proposals copied to clipboard