standards-positions WebXR Raw Camera Access API

Request for position on an emerging web specification

WebKittens who can provide input: @grorg as a WebKitten involved in Immersive Web CG/WG

Information about the spec

Spec Title: WebXR Raw Camera Access Module
Spec URL: https://immersive-web.github.io/raw-camera-access/
GitHub repository: https://github.com/immersive-web/raw-camera-access
Explainer (if not README.md in the repository): https://github.com/immersive-web/raw-camera-access/blob/main/explainer.md

Design reviews and vendor positions

TAG Design Review: https://github.com/w3ctag/design-reviews/issues/652
Mozilla standards-positions issue: https://github.com/mozilla/standards-positions/issues/667

Bugs tracking this feature

WebKit Bugzilla: https://bugs.webkit.org/show_bug.cgi?id=208988

Anything else we need to know

The WebXR Raw Camera Access Module extends the capabilities of the core WebXR Device API by allowing the sites to request "camera-access" feature when creating XR sessions. The feature allows the sites to obtain access to camera pixels (via an integration with WebGL, exposing the pixels as WebGLTextures). This capability has been requested for a long time by developers.

Jul 15 '22 18:07 bialpio

Uninformed question: Why is this different than any other camera access API on the web?

Jul 15 '22 18:07 litherum

It allows the sites to obtain camera images that are synchronized with WebXR's XRPoses. If a site were to obtain camera images in some other manner, it would be unable to correlate them with spatial data that it can get by integrating with WebXR.

Jul 15 '22 18:07 bialpio

It allows the sites to obtain camera images that are synchronized with WebXR's XRPoses. If a site were to obtain camera images in some other manner, it would be unable to correlate them with spatial data that it can get by integrating with WebXR.

Decently accurate poses can be deduced from arbitrary/in-the-wild monocular videos quite easily at this point. In a few months this may even be possible in real time, using open source code.

I do hope that we get this API soon. This point from the explainer sums up my needs:

Run custom computer vision algorithms on the data obtained from the camera texture. It may for example enable applications to semantically annotate regions of the image, for example to provide features related to accessibility.

I am probably biased in my thinking due to my specific use case, but this seems essential for any AR experience that integrates AI capabilities.

The permission modal should obviously be very clear about what is being granted, and all the usual stuff to ensure ongoing consent/understanding (e.g. red dot recording indicator type stuff).

Jan 04 '24 18:01 josephrocca

Discussing this with colleagues it’s not clear to us why this needs to be a separate API from getUserMedia(). At least it seems that with some small changes getUserMedia() could work as well. In addition some of the use cases this is used for maybe better served by new higher level WebXR APIs.

Jan 16 '24 09:01 AdaRoseCannon

Given https://github.com/WebKit/standards-positions/issues/37#issuecomment-1893366444, it's my intention to mark this as "opposed" as the WebKit's position. However, I'll give folks a week to provide more information and/or object otherwise.

May 18 '24 04:05 marcoscaceres