model-element Model extraction?

Something I hinted at in the explainer demo was the idea of 'extracting' a model out into the space of a scene - but still addressable by / subordinate to the page context.

How might we represent this? Obviously it's not super meaningful in a non-spatial context, but it's conceptually similar to "picture in picture" as it applies to iOS etc. Is that a sensible way to pursue it?

Dec 09 '24 18:12 zachernuk

/agenda to discuss appetite, possible representations

Dec 09 '24 18:12 zachernuk

What are the use cases you envision for the addressability?

One use case that could be useful is the ability to update a model after extraction. The benefit would be to have the user only have to position something accurately once, with model updates becoming much quicker.

User clicks on webpage to extract couch model A
User positions model at position X (potentially a somewhat lengthy operation to position just so)
User clicks on a webpage button for couch B
Couch B replaces couch A, at X

<model id="couch">
  <source src="modern-couch.gs" type="model/splat">
</model>

<button id=couch-b-btn">Antique Couch</button>

const modelEl = document.getElementById('couch');
const extractedModelRef = null;

const modelBButton = document.getElementById('couch-b-btn');
modelBButton.addEventListener('click', (e) => {
   if (extractedModelRef === null) return;
   modelEl.src = 'antique-couch.gs';
   extractedModelRef.src = 'antique-couch.gs';
});

modelEl.addEventListener('startextraction', (e) => {
   extractedModelRef = e.target; // some kind of reference to an extracted model
});

// user manually removes model from environment
modelEl.addEventListener('endextraction', (e) => {
  extractedModelRef = null;
});

The above would be an implementation where the model on the webpage would be disconnected from the extracted model. You could also probably do it where the models remain linked.

Dec 10 '24 05:12 m-blix

That's right - the extracted model would remain subordinate to the page context but be able to exist at a position of a user's choosing, which doesn't need to be divulged to the page context, which would be a vector for some privacy risk.

It also affords the opportunity to apply a true scale to extracted content, given that the bounds of a page/panel may also be the subject of selective scaling, as is the case in visionOS.

Dec 10 '24 07:12 zachernuk

If the model is sub-orientated to the page context then it could follow the same 3D Transformation pattern that has been proposed for "Detached CSS".

Like: transform: rotate3d(1, 1, 1, 30deg) matrix3d(1, 0, 0, 0, 0, 1, 6, 0, 0, 0, 1, 0, 50, 100, 0, 1.1);

The same system that could place a <model> could also place a webpage <img>, <video>, and eventually any 3D <html> element in WebXR!

Dec 10 '24 18:12 KooIaIa

Do we need to discuss extraction itself or can we consider that up to the UA, just as img spec doesn't need to specify what a UA can do? (i.e. a UA can let user 'extract' an image via a context menu -> 'Save Image as..')

In terms of the transform of the extracted model, I think there is an advantage to potentially not letting the page update the transform as in my example above the user would want the model position to not change. Perhaps changing the model transform could be conditional.

Dec 10 '24 18:12 m-blix

That's right - the extracted model would remain subordinate to the page context but be able to exist at a position of a user's choosing, which doesn't need to be divulged to the page context, which would be a vector for some privacy risk.

I'm unsure if that would be a good idea. Once extracted, I would expect that the model is no longer owned by the page.

Dec 23 '24 22:12 cabanier

I think the term 'extraction' can be misleading about the function of such a mode here, here's another try:

I think it's desirable to have an object that can be both associated with a page and exist at full-scale. That way the model can be updated in response to page actions, e.g. to play/pause to different presentations of a reclining sofa, or different colors (for example, being stored in different frame-ranges of a model)

There may be a difference between standard drag-and-drop that we'd want to address, but I'm very interested in how people would like to meet this general need on the spatial web.

Jan 06 '25 18:01 zachernuk

Can you think of more use-cases then a reclining sofa? A murphy bed? It stinks all the awesome uses for <model> are not being considered because of some privacy-risk. It doesn't feel like this would address general needs beyond these highly specific e-commerence ideas that WebXR is already used to solve. iOS only just recently got WebXR support in beta temporarily but if it was added sooner it would already be the de-facto way to address this.

Jan 06 '25 22:01 KooIaIa

(We didn't get to this last year, I'd like to discuss it next week) /agenda

I think this is a valuable capability to pursue anytime spatial content has a strong reason to be situated in space, and at a true (human-scale) size. Even with no other capabilities, that includes commerce, the heritage sector, data visualization and engineering applications.

While it's true that WebXR has the ability to meet these needs, it's through a totally separate track of work that cannot reuse other web capabilities directly, and the privacy risks you mention are significant enough that it's not necessarily desirable for it to be the de-facto way to address them without a really compelling reason.

Finally, <model> as it is proposed today isn't the end goal. As always, the challenge with spatial content is how to bring progressively more spatial capability into the existing taxonomy of web without falling off a cliff of irreducible complexity.

Jan 06 '25 23:01 zachernuk

Supporting content that is no longer tied to the page seems odd. What would happen with the area where the model is located? I would expect at least a permission prompt and a clear indication to the user where this detached content is coming from. Would it still display if the browser or the browser tab is hidden?

Jan 06 '25 23:01 cabanier

While it's not the exact same situation, the Picture-in-picture (PiP) API covers similar functionality:

A PiP element is normally a video, but it can also be some hierarchy of DOM content, such that authors can provide custom controls or other information.
The user gets to position a PiP element in their OS, but the invoking context doesn't know where it is.
The PiP element is destroyed if the parent context is lost, e.g. from closing the tab or navigating away. (The element is allowed to persist if the page/tab is simply backgrounded.)

Jan 06 '25 23:01 zachernuk

I think this is a valuable capability to pursue anytime spatial content has a strong reason to be situated in space

So the idea is anytime someone wants to view a spatial visualization they need to drag a model out of the page or click a PiP-like button that will randomly place it? And then there is no web styling, instead its all USD animations instead of CSS? USD will essentially become an invaluable billion dollar format for engineering, e-commerce, data visualization, and heritage usable across all browsers?

While it's true that WebXR has the ability to meet these needs, it's through a totally separate track of work that cannot reuse other web capabilities directly

Re-usable web capability in WebXR should be a priority but year after year isn't. HTML / Web Capabilities in WebXR would address this.

spatial capability into the existing taxonomy of web without falling off a cliff of irreducible complexity.

Wanting USD support in every browser is a massive cliff of irreducible complexity. It's an incredibly complex format your company is deeply invested in and you haven't even been open to other <sources>. Enhancing the capabilities of WebXR for web visualizations avoids this cliff completely.

Jan 07 '25 00:01 KooIaIa

[PiP-like activity] will randomly place [web content]?

It's not random - PiP is a capability that allows the user to place content on all supported platforms today, it would be reasonable to aim to do the same in the future.

all USD animations instead of CSS?

The explainer indicates that accessing a scene graph and/or the ability to compose DOM content along with <model> is a desirable goal - the critical question for any development roadmap is what we need to build first.

It's my belief that <model> is the best first step for all of this web-friendly composition, and allows web authors to push on a different set of boundaries than WebXR development.

Format discussion

I believe the issue at hand can be discussed independently to questions of format.

WebXR avoiding cliffs

Unfortunately, integrating web content (or allowing WebXR content to be remain active alongside other experiences) does hit very challenging security boundaries that pose similarly existential risks for the technology.

Jan 07 '25 00:01 zachernuk

While it's not the exact same situation, the Picture-in-picture (PiP) API covers similar functionality

Interesting! What happens when you use this feature on AVP? FWIW Quest browser doesn't support it.

Jan 07 '25 04:01 cabanier