proposals
proposals copied to clipboard
Region Capture: Cropping API for Video Tracks
Summary
Pre-Summary: Status
There is a detailed spec draft, and Chrome is implementing this for an origin trial.
Problem Overview
Recall that applications may currently obtain a capture of the tab in which they run using getDisplayMedia, either with or without preferCurrentTab. Moreover, soon another API will allow similar functionality - getViewportMedia. In either case, the application may then also wish to crop the resulting video track so as to remove some content from it (typically before sharing it remotely). We introduce a performant and robust API for cropping a self-capture video track.
Core Challenges
Layout can change asynchronously when the user scrolls, zooms or resizes the window. The application cannot robustly react to such changes without risking mis-cropping the video track on occasion. The browser therefore needs to step in and help.
Sample Use Case
Consider a combo-application consisting of two major parts - a video-conferencing application and a productivity-suite application co-existing in a single tab. Assume the video-conferencing uses existing/upcoming APIs such as getDisplayMedia and/or getViewportMedia and captures the entire tab. Now it needs to crop away everything other than a particular section of the productivity-suite. It needs to crop away its own video-conferencing content, any speaker notes and other private and/or irrelevant content in the productivity-suite, before transmitting the resulting cropped video remotely.
data:image/s3,"s3://crabby-images/3c280/3c2809782239feebaf9c14631f8a93a4315abb72" alt="DocsSidebar"
Moreover, consider that it is likely that the two collaborating applications are cross-origin from each other. They can post messages, but all communication is asynchronous, and it's easier and more performant if information is transmitted sparingly between them. That precludes solutions involving posting of entire frames, as well as solutions which are too slow to react to changes in layout (e.g. scrolling, zooming and window-size changes).
Goals and Non-Goals
Goals
- The new API we introduce allows an application which is already in possession of a self-capture video track, to crop that track to the contours of its desired element.
- The API allows this to be done performantly, consistently and robustly.
Non-Goals
- This API does not introduce new ways to obtain a self-capture video track.
- This API does not introduce mechanisms by which a captured document may control what the capturing document can see.
Solution
Solution Overview
A two-pronged solution is presented:
- Crop-ID production: A mechanism for tagging an HTMLElement as a potential target for the cropping mechanism.
- Cropping mechanism: A mechanism for instructing the user agent to start cropping a video track to the contours of a previously tagged HTMLElement, or to stop such cropping and revert a track to its uncropped state.
Crop-ID production
We introduce navigator.mediaDevices.produceCropId()
.
MediaDevices {
Promise<DOMString>
produceCropId((HTMLDivElement or HTMLIFrameElement) target);
};
Given an HTMLElement, produceCropId()
produces a UUID that can uniquely identify that element to our second mechanism - the cropping mechanism.
(The Promise
returned by produceCropId()
is only resolved when the ID is ready for use, allowing the browser time to set up prerequisites and propagate state cross-process.)
Cropping mechanism
We introduce a cropTo()
method, which we expose on all video tracks derived of tab-capture.
[Exposed = Window]
interface BrowserCaptureMediaStreamTrack : FocusableMediaStreamTrack {
Promise<undefined> cropTo(DOMString cropTarget);
};
Given a UUID, cropTo()
starts cropping the video track to the contours of the referenced HTMLElement.
Given an empty string, cropTo()
reverts a video track to its uncropped state.
"On-the-fly" changing of crop-targets is possible.
Code Samples
/////////////////////////////////
// Code in the capture-target: //
/////////////////////////////////
const mainContentArea = navigator.getElementById('mainContentArea');
const cropId = await navigator.mediaDevices.produceCropId(mainContentArea);
sendCropId(cropId);
function sendCropId(cropId) {
// Can send the crop-ID to another document in this browsing context
// using postMessage() or using any other means.
// Possibly there is no other document, and this is just consumed locally.
}
/////////////////////////////////////
// Code in the capturing-document: //
/////////////////////////////////////
async function startCroppedCapture(cropId) {
const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
if (!!track.cropTo) {
handleError(stream);
return;
}
await track.cropTo(cropId);
transmitVideoRemotely(track);
}
Spec draft
Please take a look at the proposed spec. (Easily missed, so repeated.)
Given an empty string, cropTo() reverts a video track to its uncropped state.
Passing an empty string to a function to make it release a crop feels dirty.
Can we not have a removeCrop()
function on the track? or have cropTo
return a promise function in order to uncrop when called - thats already a pretty established pattern in other places.
I'd still love to see an extra constraint to getDisplayMedia() in order for the browser to do the cropping before it gets to javascript land - this still doesn't solve the problem of the underlying application having access to an entire tab (in this case) - but thats completely separate to this.
Can we not have a
removeCrop()
function on the track?
Since it'd be functionally equivalent, I'd not object. But my own subjective preference is to have a single method here. Can we perhaps find an objective measure to determine the better approach?
or have
cropTo
return a promise function in order to uncrop when called - thats already a pretty established pattern in other places.
The API currently allows seamless transition from one crop-target to another. What happens in that case?
const uncropCallback1 = track.cropTo(cropId1);
const uncropCallback12 = track.cropTo(cropId2);
uncropCallback1();
What do you suggest we do in this case? Uncrop the track? No-op? Raise an exception?
I'd still love to see an extra constraint to getDisplayMedia() in order for the browser to do the cropping before it gets to javascript land - this still doesn't solve the problem of the underlying application having access to an entire tab (in this case) - but thats completely separate to this.
I agree that it's separate. Consider also the complicating factor that one document can draw on top of another - and cropping catches that. What you'd want here is element-level capture, with a div/iframe capturing itself, without capturing occluding content. It's a useful API that's under discussion. My opinion is that such an API serves different needs than cropping, and the Web needs both.
The API currently allows seamless transition from one crop-target to another. What happens in that case?
const uncropCallback1 = track.cropTo(cropId1); const uncropCallback12 = track.cropTo(cropId2); uncropCallback1();
What do you suggest we do in this case? Uncrop the track? No-op? Raise an exception?
I'd expect uncropCallback1
to become invalid/cancelled once uncropCallback12
was made... so I would expect calling uncropCallback1()
to throw an error as you're trying to uncrop something thats no longer valid.
You've mentioned precedents for this pattern. Could you please specify one or two precedents, so that I might examine the reasoning that led to that pattern being adopted there, and see if the rationale applies here too?
I had react hooks in my head but now I can't find an example... but they definitely do exist... I've used them; a function in this case cropTo
returning a pre bound function to cancel it. Kinda like setTimeout
returning a ref to the timeout.... that you then cancel with clearTimeout
... why not just return a pre-bound clearTimeout
function bound with the ID. I'll find an example tomorrow
I'd second Dan here on empty string.
The API currently allows seamless transition from one crop-target to another. What happens in that case?
Another example is if you just call the uncropper more than once.
What do you suggest we do in this case? Uncrop the track? No-op? Raise an exception?
No-op seems a good choice. I think the resolve/reject functions of Promise are quite comparable to this: The first call wins, any subsequent calls are ignored and both are tied to the same thing.
What Dan is presumably referring to is this quite common pattern:
function repeat(callback, intervalMs) {
const id = setInterval(callback, intervalMs);
return () => clearInterval(id);
}
As one can see, calling this twice would also be a no-op.
Personally, I don't mind cropTo()
/cropTo(undefined)
or cropTo(null)
or an explicit method to uncrop. Just not empty string. :slightly_smiling_face:
I believe setInterval
and setTimeout
return a handle[*], and clearInterval
accepts that handle, for a practical reason - there'd otherwise be no way to know which interval to clear. This is not a limitation of the current cropping API design, which exposes cropTo
on track, so to employ the same[*].
Please examine is returned by cropTo
in my current proposal - a Promise
which resolves after the browser can guarantee to the application that all of the next frames produced will be cropped/uncropped according to the application's latest request. I believe that's vital. Without it, the application would be consuming a few frames that are mis-cropped, and have no way of knowing how long to wait until the new crop is applied.
For the issue of ''
/null
/undefined
, I am fine with any/all of these. Maybe anything that evaluates to false
?
--
[*] This example returns a handle, not callback, btw.[**]
[**] And returning a callback which itself returns a Promise
is not very elegant in comparison with the current solution, IMHO.
There are benefits to doing cropTo(null)
in the fact you'd not need to keep track of that returned function - as long as you had access to the track you'd be able to clear the crop. In that sense I'd expect cropTo()
to be a throwable thing and cropTo(null)
to clear the existing crop on a track if it exists and a noop if one doesn't exist.
There are benefits to doing
cropTo(null)
in the fact you'd not need to keep track of that returned function - as long as you had access to the track you'd be able to clear the crop. In that sense I'd expectcropTo()
to be a throwable thing andcropTo(null)
to clear the existing crop on a track if it exists and a noop if one doesn't exist.
Then you'll be very pleased by the spec draft, which specifies exactly that, modulo different between null
and ''
.
Then you'll be very pleased by the spec draft, which specifies exactly that, modulo different between
null
and''
.
My issue was primarily with an empty string :D No harm in thinking out loud about other APIs :)
My issue was primarily with an empty string :D No harm in thinking out loud about other APIs :)
Absolutely no harm indeed. I'm very glad to hear your feedback. Please keep it coming!
Speaking of feedback, I am wondering - any objections to making a repeated call to cropTo(null)
/removeCrop()
not throw an error, but rather be a no-op? It simplifies both the implementation in the browser[*] as well as Web-apps.
In that sense I'd expect cropTo() to be a throwable thing and cropTo(null) to clear the existing crop on a track if it exists and a noop if one doesn't exist.
That's what I said further up :) cropTo()
without a parameter should be throwable as its not a valid use case of the api, but repeated calls to cropTo(null)
even if theres no crop to release should be a no-op.
That's what I said further up :)
I appear to have misread your original message. Specifically, I believe I misread "and a noop if one doesn't" as "and an exception if one doesn't." My apologies. So we're now two votes for no-op.
@eladalon1983 I remembered where this cancel a listener idea came from in my mind... firestore from google.
var unsubscribe = db.collection("cities")
.onSnapshot(() => {
// Respond to data
// ...
});
// Later ...
// Stop listening to changes
unsubscribe();
https://firebase.google.com/docs/firestore/query-data/listen#web-version-8_5
(not saying lets use it... just adding for completeness)
MediaDevices {
Promise<DOMString>
produceCropId((HTMLDivElement or HTMLIFrameElement) target);
};
Any reason why this would be limited to HTMLDivElement
or HTMLIFrameElement
? What if one wanted to share just a HTMLVideoElement
for example? The use case could be to share a video stream via a video call.
Any reason why this would be limited to
HTMLDivElement
orHTMLIFrameElement
? What if one wanted to share just aHTMLVideoElement
for example? The use case could be to share a video stream via a video call.
This was an early attempt at a compromise with Mozilla, whom I understood as somewhat open to iframe, but initially skeptical of general elements. My own preference is indeed for any element, and I hope we can come to an agreement over that. Given that any element we wish to share can be "stuffed" into a div, I believe div to be sufficient for the feature to be useful, but it is indeed a somewhat arbitrary restriction, and I hope we can move past it.
Should've commented here sooner, but the repo was adopted, then migrated to the W3C, and currently lives under https://github.com/w3c/mediacapture-region/