webrtc-nv-use-cases
webrtc-nv-use-cases copied to clipboard
Section 3.10.1: Live encoded non-WebRTC media use case not compelling
§ 3.10.1 Live encoded non-WebRTC media ... An example is traffic cameras that serve video over HTTP ...
The surveillance use case of a traffic camera is not compelling to Mozilla. We like to measure how end-users stand to benefit from technology, and this use case feels dystopian, and therefore fails on that account.
The use case also seems technically strange, because if this is just a box sitting somewhere then it's an odd use of web tech. And if it's not, and has a user interface, then wouldn’t the web app want to decode the traffic video anyway to play it locally to the user?
We think the traffic camera should support WebTransport, which was justified precisely on this rationale, making this use case moot.
Lastly, the use case seems like an optimization of what is possible today using VideoTrackGenerator. For example:
// worker.js
onmessage = async () => {
const generator = new VideoTrackGenerator();
parent.postMessage({track: generator.track}, [generator.track]);
const response = await fetch("https://example.com/traffic-cam-api");
await response.body
.pipeThrough(new TransformStream({transform: encodedDataToEncodedVideoFrame}))
.pipeTo(generator.writable);
};
// main.html
const worker = new Worker("worker.js");
worker.postMessage("go");
const {data: track} = await new Promise(r => worker.onmessage);
const pc = new RTCPeerConnection();
pc.addTrack(track);
It also seems premature to conclude that optimizing out an encoding step justifies significant API rework, or that this is the bottleneck for e.g. sending 4K over WebRTC.
Not sure why you think the use case is 'dystopian'. It's routine for PSAPs to access Traffic Cams to keep abreast of weather conditions or to dispatch police, medical or fire resources.
Here is a picture of the Fairfax County, Virginia PSAP from 2010:

In the picture above, the traffic cam video is displayed on a large monitor, but the video is not accessible on the individual dispatcher workstations. Having multiple workstations retrieving video directly from the traffic cams would likely overload the traffic cams or the PSAP Internet link or both. So it is desirable for the PSAP to retrieve a single video stream and then send it to requesting workstations P2P. The desire for P2P relay applies regardless of how the traffic cams provide the video: HTTP/HTTPS (typical today) or WebTransport (possible tomorrow).
Similar considerations apply if instead of traffic cams, the video stream came from a company meeting, a developer conference, a boxing match, or the Metropolitan Opera.
To use VideoTrackGenerator, you'd first need to decode the incoming encodedChunks, producing VideoFrames to feed in to VTG, to produce a MediaStreamTrack, which you would encode using WebRTC-PC. That's an extra decode and encode operation for every frame.
So it is desirable for the PSAP to retrieve a single video stream and then send it to requesting workstations P2P.
If overloading is the issue, using HTTP streaming seems like a good approach compared to RTP?
The desire for P2P relay applies regardless of how the traffic cams provide the video
In those cases, why is RTP needed? It seems data channel would be very well suited. If we use RTP, we might have packet loss, which might be hard to recover for instance.
We think the traffic camera should support WebTransport, which was justified precisely on this rationale, making this use case moot.
Delivery of encoded media over WebTransport is one of the options for encoded media delivery. I deliberately made the use case neutral in what method was used to get frames from the camera to the browser - with the only stipulation being that the input was encoded, and not coming over WebRTC.
This argument is not relevant to whether the use case is appropriate or not.
Not sure why you think the use case is 'dystopian'. It's routine ...
Traffic cameras enable location tracking of the public through the automated reading of license plates of traveling cars, and therefore constitutes a form of mass surveillance that goes against Mozilla principles on privacy. Our Manifesto principle 4 states: "Individuals’ security and privacy on the internet are fundamental and must not be treated as optional."
it is desirable for the PSAP to retrieve a single video stream and then send it to requesting workstations P2P
You haven't explained why this involves a user agent. Is this the same workstation driving the large monitor? If so, wouldn't the video need to be decoded anyway? If this is a box in a corner, why would web tech need to be involved?
We think the traffic camera should support WebTransport, ...
Delivery of encoded media over WebTransport is one of the options for encoded media delivery. ... This argument is not relevant to whether the use case is appropriate or not.
WebTransport has its own W3C WG with its own use cases, so WebTransport seems out of scope for the WebRTC WG. That a use case is already covered by existing APIs seems like a relevant argument to me.
That a use case is already covered by existing APIs seems like a relevant argument to me.
I think that documenting a use case that we (accidentally) support with existing APIs has value. (see also a more general comment on the list)
We have an interesting example for this. WebRTC enabled camera in a moving device (drone etc) Piloted by a local user via a web interface. The user wants to forward live video to a remote expert for their opinion and guidance. They do not want the drone to talk directly to the expert, but they also don't want to re-encode the media since they want the remote expert to see exactly what they see through the camera. (Think remote inspection of roofs for insurance etc). Currently this is done by bouncing everything through a cloud media server which adds latency and costs.
@steely-glint, can you clarify your scenario setup? AIUI, it would be something like:
- drone -> operator via RTP link (RCPeerConnection).
- operator -> monitor via reliable link (as operator wants monitor to see exactly what operator sees). It would be TCP (HTTP relay) or UDP (data channel).
AIUI, the live encoded non-WebRTC media use case would be:
- drone -> operator via HTTP link
- operator -> monitor via RTP link (RTCPeerConnection)
The scenario I have in mind (and we have prototyped):
-
drone -> operator via RTP link (RCPeerConnection).
-
operator -> monitor via RTP link (RCPeerConnection). (including NACK etc).
(There is also a operator <-> monitor audio link for 'guidance')
the 'exactly' was to try and avoid artefacts due to decode-re-encode cycles.
We don't want the remote expert inspecting 'shadows' that are in fact artefacts of an h264 -> VP8 re-encode.
(in general the drone stores a local copy (possibly at higher resolution) and this is treated as the 'video of-record' and sent to the expert later. The live video link is mostly to allow the expert to guide the pilot to capture the necessary footage)
This seems like a different use case than this one. Your use case seems closer to the SFU in a browser use case that we discussed in past meetings than the live encoded non webrtc use case that we are talking here.
We don't want the remote expert inspecting 'shadows' that are in fact artefacts of an h264 -> VP8 re-encode.
That seems unavoidable if monitor only supports VP8, otherwise H264 would be negotiated for both links. H264Decode-to-H264encode is what you would like to avoid. I wonder how feasible that is in practice if say the operator -> monitor link has a much lower bandwidth than the drone -> operator link. In that case, do we want to reduce the quality of the video seen by operator so that it matches what the monitor sees?
The video can arrive on the operator screen as RTP over a proprietary radio link to a USB dongle for example - but we have only prototyped the sfu-in-the-browser architecture.
In that case, do we want to reduce the quality of the video seen by operator so that it matches what the monitor sees?
Probably yes. (Although that may depend on the license they have - most operators are required to have 'line-of-sight' of their drones. But this usage applies equally to pipe/tunnel inspection tech etc)
Probably yes. (Although that may depend on the license they have - most operators are required to have 'line-of-sight' of their drones. But this usage applies equally to pipe/tunnel inspection tech etc)
To me, this seems worth exploring as its own use case.
"If overloading is the issue, using HTTP streaming seems like a good approach compared to RTP?"
[BA] One approach would be set up an HTTP cache (or work with a CDN vendor), so that the dispatcher workstations could retrieve frames streamed over HTTP from the WebCams. However, that might require more IT resources or cost than might be desirable. So if a browser application could handle the HTTP to P2P translation that might be desirable.
"In those cases, why is RTP needed? It seems data channel would be very well suited. If we use RTP, we might have packet loss, which might be hard to recover for instance."
[BA] Data channel has been used for this (e.g. the low-latency streaming with fanout use case). Typically, the datachannel is used to transfer the CMAF-encoded chunks transported over HTTP, with rendering done via MSE. With datachannel in workers and MSEv2 the receive pipeline could run in a worker, and there wouldn't be a need to de-containerize and packetize the media. The major advantage of WebRTC is that it is better supported on mobile devices (MSE isn't supported on iOS iPhone, for example). So in practice we have seen applications that don't care about DRM move from datachannel to WebRTC.
"Traffic cameras enable location tracking of the public through the automated reading of license plates of traveling cars and therefore constitutes a form of mass surveillance that goes against Mozilla principles on privacy."
[BA] That's an argument for constraints on the use of ML algorithms, not an argument against cameras.
Virginia has a history of major snowstorms stranding drivers on the roads, sometimes in life threatening conditions. So it's important to plan for blizzards/extreme cold.
"WebTransport has its own W3C WG with its own use cases"
[BA] I don't think this use-case relates to the streaming leg, but to P2P transport downstream of the original stream receiver. So it doesn't matter whether the initial stream came in over HTTP, WebTransport, or something else.
Traffic cameras enable location tracking of the public through the automated reading of license plates of traveling cars, and therefore constitutes a form of mass surveillance that goes against Mozilla principles on privacy. Our Manifesto principle 4 states: "Individuals’ security and privacy on the internet are fundamental and must not be treated as optional."
Mozilla's principles are laudable, but their application here seems questionable. This argument would be equally (in)applicable in a discussion about IPv6. Both technologies could be used to both save lives and to infringe on privacy.
This issue was mentioned in WEBRTCWG-2023-05-16 (Page 15)
I've done a related attempt before, measuring flv streams on the web and pushing them to rtc. Design intent: parse the flv into h264 nalu data, then push it into encodedStream and then push it to rtc. Sample code is as follows:
const localTransceiver = await local.addTransceiver("video", {
direction: "sendonly", {
});
const streams = local.addTransceiver("video", { direction: "sendonly", })
const streams = localTransceiver.sender.createEncodedStreams();
const writer = streams.writable.getWriter();
// encodedVideoFrame ?
writer.write(encodedVideoFrame)
I know that the encodedVideoFrame data is 264 nalu, but I don't have a way to create an encodedVideoFrame.
Of course, I could use insertableStream to decode and re-encode, but that adds a lot of extra overhead.