mediacapture-region What makes CropTarget special to require an asynchronous creation?

We started discussing this topic (whether CropTarget creation should use promise or not) as part of https://github.com/w3c/mediacapture-region/issues/11 and it seems best to use a dedicated thread for this issue.

@eladalon1983 mentioned implementation issues that make using promises desirable, in particular security and potential race conditions if creating synchronously a CropTarget since CropTarget has to keep a link to the process it was created in. The following case was provided:

create CropTarget in process A
postMessage it synchronously to another process B
(optionally) postMessage it yet again to another process C
Use CropTarget to crop the track. @eladalon1983, please correct this description if it is not aligned with your thoughts.

Existing objects like MessagePort, WHATWG Streams, RTCDataChannel or MediaStreamTrack can be created-then-postMessaged synchronously and UAs are implementing this today, hopefully without the security/race condition issues. AIUI, it seems consistent to use the same approach to CropTarget (synchronous creation), except if we find anything specific to CropTarget that actually prevents this existing pattern.

Feb 02 '22 09:02 youennf

I think it's easiest to answer with Chrome as the concrete example, thereby keeping the discussion simpler. This is generalizable to other browsers.

Chrome has a central "browser process," and documents are hosted in "render processes." (For simplicity, let's pretend every document has a dedicated render process.) Let's examine multiple documents embedded together in another document, and all living together in the same tab.

Agains for simplicity, we'll call the document where the crop-target lives SLIDE, and the document which holds the track we'll call VC. This I find easier than talking about D1, D2 etc., as we can have a practical example in our mind's eye. If necessary, map (SLIDE, VC) to (D1, D2).

CropTarget is essentially a token. That token is produced in SLIDE and passed elsewhere. It may be passed directly to VC or indirectly. A design that allows for it to be safely be passed through other documents is preferable, as it requires less care of developers. To be safely passed through other documents (and therefore processes), it should encode the minimum amount of information. This is mostly true for JS-exposed information, but non-JS-exposed information that lives in the render process that holds the token, is also theoretically accessible to malicious documents under certain conditions.

So, to keep the minimum amount of information, the token should not actually encode the information that it originates in SLIDE. Instead, this knowledge will be recorded in the trusted browser process as a mapping of T<->SLIDE.

When the token is minted, this mapping has to be recorded in the browser process, which requires IPC, which means that minting the token should be asynchronous. (Minting can fail if the browser process refuses to record more mappings.)

To generalize away from Chrome, other UA-implementers will either run into similar implementation constraints, or else they can just return a pre-resolved Promise<CropTarget> and not worry about it.

Feb 02 '22 11:02 eladalon1983

When transferring a MessagePort or a RTCDataChannel from SLIDE to VC, there is a need to identify that the transferred object is originating from SLIDE, just like CropTarget. How is it possible to be synchronous for those objects but not for CropTarget?

Feb 02 '22 11:02 youennf

There could be multiple documents in between SLIDE and VC. Each hop only exposes its own origin. So if the token is sent SLIDE->TOP_LEVEL->VC, then VC would not know the origin of SLIDE, only of TOP_LEVEL.

Feb 02 '22 11:02 eladalon1983

Again, the same could be said about transferring a RTCDataChannel from SLIDE->TOP_LEVEL->VC. Either this is a new security hardening rule that we cannot enforce it in old APIs but should in new APIs or there is something specific to CropTarget. I am unclear which one it is from your explanations.

Feb 02 '22 14:02 youennf

I don't know how common it is to transfer RTCDataChannel objects. (I suspect somewhat uncommon...?) CropTarget is designed for transfer, and would have been wholly unnecessary otherwise. Given the cheap cost of ruggedizing CropTarget against being utilized by an attacker, I see it as desirable. I don't know enough about RTCDataChannel in order to say whether this was necessary there too but prohibitively complicated, or any other reason.

Feb 02 '22 14:02 eladalon1983

What about MessagePort then? MessagePort are very common and designed specifically for being transferred. From what I can see so far, the problems you are describing have been solved for those objects in a secure and efficient way without using promises.

Feb 02 '22 14:02 youennf

Does MessagePort remain bound in any way, or encode to some degree, the document in which it was originally instantiated? If I create a MessagePort in D1, then transfer it to D2, then to D3, will D3 know that this MessagePort is originally from D1?

Feb 02 '22 14:02 eladalon1983

Does MessagePort remain bound in any way, or encode to some degree, the document in which it was originally instantiated?

It remains bound to the MessagePort it was created jointly through MessageChannel.

If I create a MessagePort in D1, then transfer it to D2, then to D3, will D3 know that this MessagePort is originally from D1?

For RTCDataChannel, WHATWG Streams and MediaStreamTrack yes, always, the 'source' remains in D1.

MessagePort are a bit specific in that they are created as a pair through MessageChannel that can keep communicating with each other. MessageChannel.port1 can be synchronously transferred to another process, as well as MessageChannel.port2. port1 and port2 remain bound together in any case.

Feb 02 '22 14:02 youennf

Another example that might be closer to CropTarget is OffscreenCanvas. OffscreenCanvas can be created from a canvas element, in which case it remains tied to the canvas element like CropTarget would be to its element. Such OffscreenCanvas can be created-then-transferred synchronously.

Feb 02 '22 15:02 youennf

@eladalon1983 said in https://github.com/w3c/mediacapture-region/issues/11#issuecomment-1023100726:

When D2 calls cropTo(X), the UA has to validate that X is a valid CropTarget.

It is undesirable to query all documents and check if any of them have produced X

cropTo already returns a promise, so querying "all" documents in the single viewport captured to identify the crop target, seems reasonable to me.

Cost to implementers is low on the priority of constituencies when it comes to shaping APIs.

Feb 03 '22 19:02 jan-ivar

cropTo already returns a promise, so querying "all" documents in the single viewport captured to identify the crop target, seems reasonable to me.

I think you mean "browsing context" where you said "viewport". Do you have a reasonable limit on how many different documents could be embedded in that browsing context?

Cost to implementers is low on the priority of constituencies when it comes to shaping APIs.

Cost to implementers is low priority, but non-zero. It's only a problem if accommodating implementers comes at a non-trivial cost to a higher-priority constituency. Does it?

Feb 03 '22 21:02 eladalon1983

I think you mean "browsing context" where you said "viewport".

No, each iframe has its own browsing context, which is nested (immediately or multiple levels under) the top-level browsing context.

I loosely mean all documents in this capture, which maybe translates to the top-level browsing context's document and all documents in its nested browsing contexts of iframes that intersect the viewport.

Do you have a reasonable limit on how many different documents could be embedded in that browsing context?

Looking up the CropTarget shouldn't be the bottleneck in extreme cases, so this should scale fine.

Feb 03 '22 22:02 jan-ivar

I think you mean "browsing context" where you said "viewport".

No, each iframe has its own browsing context, which is nested (immediately or multiple levels under) the top-level browsing context.

I loosely mean all documents in this capture, which maybe translates to the top-level browsing context's document and all documents in its nested browsing contexts of iframes that intersect the viewport.

I've been carrying that mistake around for a while. Thanks for enlightening me.

Do you have a reasonable limit on how many different documents could be embedded in that browsing context?

Looking up the CropTarget shouldn't be the bottleneck in extreme cases, so this should scale fine.

IPC with multiple processes is neither simple, nor performant, nor robust. The cost to implementers is greatly reduced when avoiding this. What's the downside to any other constituency?

Feb 03 '22 22:02 eladalon1983

IPC with multiple processes is neither simple, nor performant, nor robust. The cost to implementers is greatly reduced when avoiding this.

This is a known problem that is solved in modern browsers. A transferred WritableStream should not do multiple IPCes to locate the process of its sink when writing new values. As I said before, I'd like to understand why it would be more difficult with CropTarget than with all these other existing APIs.

What's the downside to any other constituency?

It is more costly to both web developers and web engines. It is not consistent with existing Web APIs AFAIK.

Feb 04 '22 09:02 youennf

Given this is a solved problem for other APIs and given these solutions are applicable to CropTarget as well, can we converge on moving away from Promises?

Mar 08 '22 08:03 youennf

IPC with multiple processes is neither simple, nor performant, nor robust. The cost to implementers is greatly reduced when avoiding this.

This is a known problem that is solved in modern browsers. A transferred WritableStream should not do multiple IPCes to locate the process of its sink when writing new values. As I said before, I'd like to understand why it would be more difficult with CropTarget than with all these other existing APIs.

I believe I have explained why we have implemented things this way in Chrome. This is a real issue.

What's the downside to any other constituency?

It is more costly to both web developers and web engines. It is not consistent with existing Web APIs AFAIK.

The cost to Web developers is negligible. Crop-target production is a rare occurrence, it does not matter to the Web developer if it complete asynchronously. I can pull in Web-developers currently using Region Capture (in origin trial) for major products with a high level of polish, and they could comment as much. Would you find that convincing? (If not - please pull in a similarly qualified Web developer who could comment to the contrary.)

Given this is a solved problem for other APIs and given these solutions are applicable to CropTarget as well, can we converge on moving away from Promises?

Let's converge towards Promises, given that it's an important implementation issue for Chrome. (And I believe that when the time comes for Safari and Firefox to implement this, they'll find it equally problematic.)

Mar 08 '22 19:03 eladalon1983

I believe I have explained why we have implemented things this way in Chrome. This is a real issue.

You explained a real issue, that I would classify as an optimization problem (though at some point you alluded to security concerns as well). Is that correct?

The argument is that the same optimization issue exists for already deployed APIs, and was solved without making use of promises. If we do not want to follow this preexisting pattern, we need a clear justification.

To move forward, let's try with more narrowly focused questions:

Do you think that other existing web APIs enumerated in this thread have the same issue? Or do you think this is new problem specific to region?
If it is a new problem, can you detail what exactly is new?
If it is not a problem specific to region, what makes you think the solutions used for implementing those APIs are not good enough for region?

As a recap, here are some APIs that I think face the same issue:

Create a MessageChannel and transfer one of its MessagePort synchronously after creation. The transferred port needs to be able to locate in an optimal manner where is living its related channel port.
Create a MediaStreamTrack from a canvas and transfer it synchronously after creation. The transferred track needs to be able to locate in an optimal manner where is living the canvas element.
Create a RTCDataChannel from a RTCPeerConnection and transfer it synchronously after creation. The transferred data channel needs to be able to locate in an optimal manner where is living the peer connection.

I believe that when the time comes for Safari and Firefox to implement this, they'll find it equally problematic.

I implemented in WebKit some of the APIs that I think are facing the issue you are describing. For that purpose, existing techniques were reused so that we do not introducing delay in creation of the objects/delay in transferring the objects/delay in the object algorithms/race conditions.

Answering the above questions might help figuring out which problems I might be overlooking.

Mar 11 '22 08:03 youennf

Note: When we started trying to transfer MediaStreamTracks in Chrome, the synchronous nature of the transfer gave us major problems in implementation. So the idea that synchronous = solved problem is not universally true.

Mar 18 '22 14:03 alvestrand

It is great to hear Chrome implementation on transferring tracks is making progress. It is also great to hear Chrome implemented the transfer of synchronously-created MediaStreamTracks in a secure and efficient manner. Can you validate whether the same approach can be used for CropTarget?

The idea that synchronous = solved problem is not universally true.

The point is more about create-then-transfer-synchronously is a solvable problem (do we all agree?), it was solved multiple times already. And that the web platform never cared about this particular implementation difficulty when designing new APIs.

To break this existing pattern, compelling motivations seem necessary.

Another reason not to use promises: what happens in case the element goes transferred to another document, which then gets destroyed (and the element gets reattached to another document), all of this during creation of the CropTarget. Should we reject the promise? A synchronous API is simpler for edge cases as well as for web developers.

Mar 21 '22 14:03 youennf

what happens in case the element goes transferred to another document

I had the same concern. I lost that concern when I learned that Elements are not transferable. (But do correct me if I am wrong.)

which then gets destroyed

It is generally possible for a CropTarget to outlive its Element, and that's OK. The document discusses what happens then. The summary is:

If cropped to a target that is in the DOM, has >0 pixels, etc., frames are produced.
Otherwise, no further frames are produced. The track is effectively "muted" (but without the muting - as per our earlier agreement).
Elements can be garbaged collected even! That's not special. It falls into the category set by the previous bullet-point.
It's always possible to get a track "unstuck" by calling cropTo again, either to uncrop it or to re-crop it to a new target.

Mar 21 '22 15:03 eladalon1983

I had the same concern. I lost that concern when I learned that Elements are not transferable.

The point I am making is during the creation of the CropTarget, i.e. when the promise is not yet settled.

Mar 21 '22 16:03 youennf

I had the same concern. I lost that concern when I learned that Elements are not transferable.

The point I am making is during the creation of the CropTarget, i.e. when the promise is not yet settled.

The thing I am still not getting is - what's going to happen to the Element during that time? The worst that could happen is that it gets garbage collected. I don't think that's a problem. It doesn't seem to matter if the Element is GCed before/after its CropTarget is produced. (And getting GCed after CropTarget-production should be a normal occurrence.)

Mar 21 '22 16:03 eladalon1983

I want to settle this discussion about Promises. And I don't want to leave your message unanswered. Let's briefly examine the three other APIs you've brought up:

MessageChannel:
- If implementing with direct communication between the processes, the risk involved is a necessary evil. This cannot be said for CropTarget. Discoverability in either direction is not a requirement here, and confers little to no benefit.
- If implementing with mediation via another process, the story gets more complicated. A valid implementation can hide that it's asynchronically minting identifiers, behind the moment of posting the MessageChannel to the other document. (Some compromises are required, though.) But I don't want to discuss this because it would lose track of the topic - see below.
MediaStreamTrack:
- These are not currently transferrable in Chrome - not synchronously, not at all. To claim it's possible one needs to present an implementation as proof, not a specification. (Does it work on Safari or Firefox yet? This demo suggests that it's not, as of 2022-03-31.)
- My colleauges working on mst-transferability tell me that they are running into a lot of issues precisely because of the requirement that they be "synchronously transferable".
RTCDataChannel:
- As of the time of this writing, these don't appear to be transferable in neither the specification nor Chrome's implementation.

I hope we can proceed without trifurcating the discussion. I did not want to leave your points unanswered, but deep-diving into these three examples will be unwise. We have an independent engineering question here, and it can be resolved on its own merits. These precedents do not seem applicable. Nor should we assume that mistakes and compromises were not made in the design of these other APIs. Let's discuss our own case on its own merits.

I believe I've made a compelling case for why produceCropTarget() should be asynchronous.

It allows CropTarget to avoid potentially-risk identifiers.
It's the conservative design choice to make.
It imposes negligible costs on all constituencies.
There's nothing stopping implementations from returning a pre-resolved Promise if they so wish.

Let's go with an asynchronous produceCropTarget().

Mar 31 '22 20:03 eladalon1983

I should have posted https://github.com/w3c/mediacapture-region/issues/11#issuecomment-1126528615 here. To summarize it: At least two highly skilled technical people were confused by the current API into thinking it does more than it does.

That's a cost to web developers that we should and do avoid on the regular, as @youennf shows.

May 15 '22 02:05 jan-ivar

(Minting can fail if the browser process refuses to record more mappings.)

This is an incorrect implementation since produceCropTarget is infallible.

May 15 '22 03:05 jan-ivar

I should have posted #11 (comment) here. To summarize it: At least two highly skilled technical people were confused by the current API into thinking it does more than it does.

It's unclear to me who was confused and in what way, and how making the API synchronous would solve their confusion.

May 15 '22 16:05 eladalon1983

I would first like to get consensus on one API design point. Hopefully, we can all agree on something like:

For any new API, we try to make it synchronous if we can.
If the algorithm requires some hopping to other threads/processes/environments, we switch to async.
If synchronous implementations are overly complex, we switch to async.

AIUI, Chrome current implementation is asynchronous, and @eladalon1983 is stating that doing a synchronous implementation is overly complex. @jan-ivar and @youennf think that such a synchronous implementation is not complex. Some implementation strategies and already implemented APIs that deal with the same issue have been suggested. The rest of the message (sorry for its length) is describing that.

MessageChannel:

If implementing with direct communication between the processes, the risk involved is a necessary evil. This cannot be said for CropTarget. Discoverability in either direction is not a requirement here, and confers little to no benefit.

If implementing with mediation via another process, the story gets more complicated. A valid implementation can hide that it's asynchronically minting identifiers, behind the moment of posting the MessageChannel to the other document. (Some compromises are required, though.) But I don't want to discuss this because it would lose track of the topic - see below.

@eladalon1983, I think we do agree MessagePort can be created and transferred cross process synchronously. Can you validate Chrome implementation of MessagePort creation/transfer, in particular whether Chrome is minting asynchronously identifiers for MessagePorts?

FWIW, given we now have such MessagePort, we can reuse MessagePort to implement CropTarget. Below algorithm is taking the hypothesis that CropTarget is transferable and can be used only once (these limitations can be easily fixed by recreating MessageChannels as required):

At CropTarget creation, create a MessageChannel, set port1 to a slot of the element, and port2 to a CropTarget slot.
When transferring CropTarget to another environment, transfer CropTarget.port2, and recreate a new CropTarget with the transferred port2
When calling cropTo with the new CropTarget, transfer CropTarget.port2 to the process doing the capture.
In the capture process, use port2 to communicate with the element (through port1) to gather the necessary states to start the actual cropping.

Steps 1 and 2 can be done synchronously. Steps 3 and 4 are asynchronous which is fine since they are run as part of cropTo which returns a promise.

MediaStreamTrack:

These are not currently transferrable in Chrome - not synchronously, not at all. To claim it's possible one needs to present an implementation as proof, not a specification. (Does it work on Safari or Firefox yet? This demo suggests that it's not, as of 2022-03-31.)

My colleauges working on mst-transferability tell me that they are running into a lot of issues precisely because of the requirement that they be "synchronously transferable".

It would be nice to hear about the exact MST issues. I would bet this is due to MST complex state handling. CropTarget has no changing state, which makes it a much simpler object to transfer. That said, video MST are probably already transferable synchronously in Chrome by doing the following:

Create a video MediaStreamTrack from canvas
Get the ReadableStream from MediaStreamTrack using MediaStreamTrackProcessor
Transfer the ReadableStream
Recreate the MediaStreamTrack from the transferred ReadableStream using VideoTrackGenerator

Again step 1, 2 and 3 are all synchronous, from creation of a MediaStreamTrack to transferring the MediaStreamTrack via ReadableStream.

RTCDataChannel:

As of the time of this writing, these don't appear to be transferable in neither the specification nor Chrome's implementation.

It is specified in https://w3c.github.io/webrtc-extensions/#rtcdatachannel-extensions. It is implemented in Safari by reusing pre-existing mechanisms. One reason why it might not be as difficult to implement as MST is that we restrict when RTCDataChannel can be transferred. This simplifies a lot the state handlings. CropTarget is much simpler to transfer than RTCDataChannel since it has no changing state.

The fourth implementation that was brought to the discussion is the following:

CropTarget stores an identifier of the process where lives the element (or the document environment ID which is generated for each document before hand, so can be retrieved synchronously).
CropTarget stores a locally generated identifier of the element.
Element is identified by the pair of these two identifiers
CropTarget is serialised by serializing these two IDs.
When calling cropTo, the capture process identifies the element process through the first ID and gathers element information through the second ID.

Steps 1 to 4 can be done synchronously.

May 16 '22 08:05 youennf

I stopped agreeing at 1.

The tendency in WebRTC that I see is to make new APIs asynchronous, because we've had too many instances where we specified a synchronous interface and found that there were cases where they could not be done reliably in a synchronous manner (getCapabilities being the most recent).

I would only design a synchronous interface if it was obvious that any reasonable implementation would never have a need for a step involving interactions with entities outside the renderer process hosting the API.

May 16 '22 13:05 alvestrand

I don't understand the claim that "transferring is synchronous". The "Message port post message steps" (https://html.spec.whatwg.org/multipage/web-messaging.html#message-port-post-message-steps) contains an "Add a task that runs the following steps to the port message queue of targetPort".

How can an algorithm that posts a task to a different task queue be considered synchronous?

May 16 '22 13:05 alvestrand

I would only design a synchronous interface if it was obvious that any reasonable implementation would never have a need for a step involving interactions with entities outside the renderer process hosting the API.

I do not think creating a cropTarget requires any "interactions with entities outside the renderer process", depending of course of your definition of interactions. Contrary to actually cropping a track for instance. Hence my conclusion that creating a CropTarget does not need to be asynchronous while cropTo should be asynchronous.

How can an algorithm that posts a task to a different task queue be considered synchronous?

Sorry, this was misleading. What I tried to say is that the call to postMessage happens synchronously with the creation of the object. The whole postMessage algorithm is of course asynchronous.

@eladalon1983 mentioned several times that it has to be asynchronous between the time the web app wants to create the object and the time the web app wants to transfer the object. Performances and potential race conditions were mentioned.

The point I am making is that this issue has been solved multiple times and for objects more complex than a CropTarget. This makes me think asynchronicity is not needed by reasonable UA implementations, they will just leverage existing UA patterns.

May 16 '22 19:05 youennf

mediacapture-region mediacapture-region copied to clipboard

What makes CropTarget special to require an asynchronous creation?

mediacapture-region
mediacapture-region copied to clipboard