mediacapture-transform icon indicating copy to clipboard operation
mediacapture-transform copied to clipboard

How to handle varying pixel formats

Open youennf opened this issue 2 years ago • 3 comments

A transform exposes video frames that can be of various pixel formats (https://w3c.github.io/webcodecs/#enumdef-videopixelformat). Depending on the OS and/or camera, this might be I420 or NV12 right now for cameras. This might probably be the same for video element exported tracks, RGBA might be used for canvas capture tracks maybe.

It seems this can lead us to interop issue, especially for camera tracks, where applications will expect a given format and will break whenever their assumption is wrong. I see a few options:

  • Let the web app deal with it: they can implement their own conversion in JS (computationally expensive though)
  • Let the web app easily convert video frames to another format.
  • Let the web app prescribe the pixel format it wants as API when creating the transform.
  • Let the UAs consistently select pixel formats (specs recommend or require to use a particular pixel formats, on a source type maybe)

youennf avatar Apr 21 '22 09:04 youennf

Ditto for other characteristics such as color space (fullRange or not fullRange e.g.).

youennf avatar Apr 21 '22 09:04 youennf

this was also discussed a bit under https://github.com/webmachinelearning/webnn/issues/226#issuecomment-1031518141

dontcallmedom avatar Apr 21 '22 13:04 dontcallmedom

Regarding conversion by the web app, a relatively easy and efficient way of converting to RGBA is through WebGPU (well, "relatively easy" provided you're familiar with a few WebGPU concepts, and "efficient" when the underlying data of the VideoFrame is on the GPU). The external texture sampler returns pixels in RGBA (or BGRA) with the specified color space, regardless of the pixel format of the external texture.

Here is a code example of a transformer function that converts a VideoFrame to RGBA with WebGPU. Most lines are "boilerplate" code to use WebGPU. The actual conversion is done by the fragment shader and does not require specific knowledge of pixel formats and conversion formulas:

fn frag_main(@location(0) uv : vec2<f32>) -> @location(0) vec4<f32> {
  return textureSampleBaseClampToEdge(myTexture, mySampler, uv);
}

tidoust avatar Feb 02 '23 10:02 tidoust