webxr icon indicating copy to clipboard operation
webxr copied to clipboard

Content in immersive session should need not be search around

Open idrisshah opened this issue 4 years ago • 36 comments

When user enter into immersive session, general expectation is that the immersive content is shown in the direction where the user is looking. Thats not what is happening now. What happens now is that origin of the local reference space points to the device boot up direction. If i have few object in my immersive session, i would have to turn around to search where the content is. This does not feel right. It's the same behavior on Magic Leap, Oculus and Day Dreamer as confirm by @cabanier and Ravi.

idrisshah avatar Apr 01 '20 21:04 idrisshah

This seems to be an application/toolkit issue..

The apps and toolkits can tell what the initial direction is (both in their non-WebXR and initial WebXR views), and they know where their content is. WebXR (the API) has no idea where the user was looking in the non-WebXR view before it was entered, and doesn't know what direction content is.

Seems like you should be filing an issue on the toolkit (threejs, aframe, babylon, etc)?

blairmacintyre avatar Apr 01 '20 21:04 blairmacintyre

The issue is reproducible with webxr samples without any framework like: https://immersive-web.github.io/webxr-samples/input-selection.html. You can see once immersive session is launched, the spinning cubes are not in front of the user. They always show up where device was pointing at boot time.

idrisshah avatar Apr 01 '20 22:04 idrisshah

Right, he's saying that the application should be fixing this by offsetting to the initial viewer pose (and the framework can do this for you if possible).

That said, defining the local space as "roughly the same position and orientation as the user's head at session startup" as a UA-flexible requirement would be nice IMO. That's how I've been looking at it anyway.

Manishearth avatar Apr 01 '20 22:04 Manishearth

That said, defining the local space as "roughly the same position and orientation as the user's head at session startup" as a UA-flexible requirement would be nice IMO. That's how I've been looking at it anyway.

A drawback of that approach is that this will shift the origin each time you leave and re-enter the immersive session

cabanier avatar Apr 01 '20 23:04 cabanier

A drawback of that approach is that this will shift the origin each time you leave and re-enter the immersive session

I do not see it as a drawback for local space atleast. That i think is a more desirable option.

idrisshah avatar Apr 01 '20 23:04 idrisshah

I do not see it as a drawback for local space atleast. That i think is a more desirable option.

Let's say an end user goes immersive and decorates their room with furniture. Then they exit the session to look something up on another browser instance. If they then enter the session again, all the furniture will be shifted.

cabanier avatar Apr 01 '20 23:04 cabanier

That i think would require more information than just reference space. may be anchors? Or may we should provide an option to the content developer about the origin.

The above example will be invalidated even when the device is rebooted with a different orientation.

idrisshah avatar Apr 01 '20 23:04 idrisshah

That i think would require more information than just reference space. may be anchors? Or may we should provide an option to the content developer about the origin.

The above example will be invalidated even when the device is rebooted with a different orientation.

Correct. I was talking about staying on the same web page/session but going back and forth into immersive.

cabanier avatar Apr 02 '20 01:04 cabanier

Let's say an end user goes immersive and decorates their room with furniture.

local spaces shouldn't be used for this tbh. But this is a super valid concern, yeah, because people will use it anyway.

Perhaps we can suggest "it's roughly where the head is for the first time the session is started for the current page"

Manishearth avatar Apr 02 '20 02:04 Manishearth

@cabanier I don't think we can assume that the space won't change when you stop and start a session. When a session stops, all the resources associated with it are likely destroyed, and restarting the session is equivalent to reinitializing everything. If I'm on the page, enter immersive mode, leave immersive mode (end the session), walk across the room with the page still showing, and then re-enter immersive mode (start a new session), there should be no expectation that the space will be the same.

More importantly, as @Manishearth says; the local coordinates are completely undefined when the session starts. And, local coordinates can change at any time as the system refines it's understanding of the world. ARKit and Hololens, for example, make it VERY clear that the local coordinates you get from them are arbitrary and can change. WebXR is similar.

AR apps will not work properly until we define anchors and have ways of attaching content to places that the underlying system works to keep aligned. The scope of current and mostly agreed-on AR features (AR mode + hit testing) only support the most trivial, ephemeral sort of AR. Certainly not the example you describe of putting furniture around a room for more than just a short time frame. Walking around a space like a room without anchors will results in the coordinate system changing.

However, I will note that the current proposals for anchors most likely break in the situation you describe anyway: anchors are most likely associated with sessions, and if you stop the session, they go away. Certainly the obvious implementation of WebXR with ARKit has ARKit shutting down and restarting when sessions stop and start.

blairmacintyre avatar Apr 02 '20 02:04 blairmacintyre

Perhaps we can suggest "it's roughly where the head is for the first time the session is started for the current page"

For the oculus case, the local space's origin is where the user localized, either by setting up guardian or holding the oculus button. Since we're VR and we don't support large space, this space ends up being close to the viewer. I don't think we want to change it to the current location of the headset since that will often end up giving an incorrect result. For instance, if the user looked left and up and clicked, then the origin of the scene will be on the left and up.

The original suggestion is the correct one: if the page wants to show something in front of the user, they have all the information to do so and make the transformation themselves. UAs shouldn't try to play tricks.

cabanier avatar Apr 02 '20 05:04 cabanier

@cabanier I don't think we can assume that the space won't change when you stop and start a session.

I was mostly pointing out to @idrisshah's that his proposal might have unintended consequences.

When a session stops, all the resources associated with it are likely destroyed, and restarting the session is equivalent to reinitializing everything. If I'm on the page, enter immersive mode, leave immersive mode (end the session), walk across the room with the page still showing, and then re-enter immersive mode (start a new session), there should be no expectation that the space will be the same.

Maybe not, but this is how Daydream, Magic Leap and Oculus currently work.

Also, exiting a session doesn't have to destroy anything. GL and JS resources are still valid and the system's localization system keeps running.

... AR apps will not work properly until we define anchors and have ways of attaching content to places that the underlying system works to keep aligned. The scope of current and mostly agreed-on AR features (AR mode + hit testing) only support the most trivial, ephemeral sort of AR. Certainly not the example you describe of putting furniture around a room for more than just a short time frame. Walking around a space like a room without anchors will results in the coordinate system changing.

I don't think that is true for the more sophisticated devices. The current anchors proposal is just for mobile phones where this is a problem.

However, I will note that the current proposals for anchors most likely break in the situation you describe anyway: anchors are most likely associated with sessions, and if you stop the session, they go away. Certainly the obvious implementation of WebXR with ARKit has ARKit shutting down and restarting when sessions stop and start.

Yes, we need persistent anchors to have true persistence.

cabanier avatar Apr 02 '20 05:04 cabanier

I don't think that is true for the more sophisticated devices. The current anchors proposal is just for mobile phones where this is a problem.

It's true for Hololens, which I consider the most sophisticated AR device/platform. And it's true for ARKit, which I also consider a fairly modern realistic platform.

I'm not talking about persistence, I'm talking about moderate run-lengths in realistic environments. The docs for both HL and ARKit are pretty clear why this is the case. I haven't used an ML enough to see how it deals with the obvious problems of needing to adjust the coordinate frame as you move over a very large space (e.g., I've walked over the 3 floors of my house with an ARKit app and an HL app running WebVR and WebXR, and displayed the resulting geometry, you can see it on an old blog post of mine). It's pretty clear the underlying viewer coordinate system changed and the locations of the meshes and anchors relative to it updated appropriately, over those journeys, and that's just a small single-family home. I've also walked around my neighborhood with the same ARKit demo running; same thing. I would be surprised if any sophisticated platform (e.g., ML) managed to support large scale journeys like this while keeping the viewer coordinate system locked to the original starting coordinate system (and I'd be curious how they would do this while supporting even longer runtimes over even greater areas).

VR devices all are designed to operate in a fairly small area (marked out at the start) and the content in that area is essentially expressed relative to that area. If the system makes small adjustments while it's running, things are fine; the whole coordinate frame is just adjusted. So, yes, all current VR devices could give you the same repeatable coordinate system.

blairmacintyre avatar Apr 02 '20 11:04 blairmacintyre

I don't think that is true for the more sophisticated devices. The current anchors proposal is just for mobile phones where this is a problem.

It's true for Hololens, which I consider the most sophisticated AR device/platform. And it's true for ARKit, which I also consider a fairly modern realistic platform.

For the case of placing furniture in a room, the coordinate system certainly won't change on HL either. If you start venturing out and the system relocalizes, yes you will reset to a new origin (which is is also the case with Oculus) but that is not the issue that @idrisshah wants to resolve.

The question is: when you create the local space, should its origin be where the viewer currently is, or should it be at device's native origin? From experiments, it seems UA's implemented the latter and IMO we should keep it that way.

cabanier avatar Apr 02 '20 17:04 cabanier

The question is: when you create the local space, should its origin be where the viewer currently is, or should it be at device's native origin? From experiments, it seems UA's implemented the latter and IMO we should keep it that way.

I think it should be left up to the platform, and the frameworks should stop making bad assumptions based on what initial implementations do.

I agree 100% that if I am in a 2D web page view and the 3D scene in it is showing me something, that when I switch to immersive mode, the apps should be smart enough to set the view appropriately. Assuming the platform will manage this is unrealistic, and either of the choices for where the origin of the coordinate system are won't do the right thing to solve @idrisshah's issue.

blairmacintyre avatar Apr 02 '20 17:04 blairmacintyre

Lots of excellent points made above! I wanted to chime in with some additional observations:

First off, it's highly likely that some browser/hardware combos can and should be doing a better job at delivering more natural origins. We should all collectively look at improving that, but given that this is a spec repo I want to focus on what the spec should or should not universally enforce.

I can see a good argument for enforcing local spaces (and only local spaces) to set the origin to the current position/direction when initialized, but that still wouldn't give universally desriable behavior. In the case of a in-VR browser you'd usually get sensible results, but it also means that if you were looking to the side or squatting down/standing up when the reference space initializes (which may be a different point in time than when you click the "Enter XR" button) then your session will be stuck in an odd spot for the duration, or with the "forward" direction stuck off to the side due to an errant head motion at init time. It'd generally be better if the platform has some sense of where a natural origin is (ie: The origin from the last orientation reset) to just use it instead, as the user has probably already specifically calibrated that to be at a comfortable place. Similarly, such systems will want to allow the local origin to follow the system origin when the user does an orientation reset (long press Oculus button, for example), and we don't want to do anything to prevent that.

On the flip side, if the hardware is a tethered headset then there's a high likelyhood that the headset will be placed on a desk, wall hook, shelf, or other similarly awkward spot when the reference space initializes. That's definitely not what the user wants. Unfortunately for a local space the calibrated "room center" may not be the right thing either. (For example: If I'm sitting at my desk at home SteamVR's calibrated room origin is behind me and rotated 180 degrees. Great for a bounded-floor standing experience, not so much for a local cockpit/floating experience.) I don't have a good answer for this, other than that some systems do allow separate calibration of a sitting/standing origin and we should maybe try to use those more consistently? In any case, forcing local to be a newly initialized value each time in the spec would be very problematic for this case.

In the meantime apps that want better control over their origins can always offer an in-app calibration screen (these are common in native apps) that instructs the user to, for example, find a natural position and press a button to start the experience. Internally the app can call getOffsetReferenceSpace() on the local reference space with the current pose (or the inverse of the current pose? I forget which) to retrieve a new reference space with that user-calibrated origin. This would actually be a really nice drop-in utility for the major frameworks to have.

toji avatar Apr 02 '20 19:04 toji

Thanks for all the comments. It really helped to listen to different point of views.

I still feel that having an option for the content developer to define the origin in some sort of reference space (may be only local reference space) can help in better user experience. Image a user sitting on a sofa and trying to play a video in immersive session. It would be annoying if the video is always shown on his backside just because he booted his device in that direction. providing an extra optional parameter while requesting a reference space can go along away to mitigating the issue.

idrisshah avatar Apr 03 '20 00:04 idrisshah

I can see a good argument for enforcing local spaces (and only local spaces) to set the origin to the current position/direction when initialized, but that still wouldn't give universally desriable behavior. In the case of a in-VR browser you'd usually get sensible results, but it also means that if you were looking to the side or squatting down/standing up when the reference space initializes (which may be a different point in time than when you click the "Enter XR" button) then your session will be stuck in an odd spot for the duration, or with the "forward" direction stuck off to the side due to an errant head motion at init time. It'd generally be better if the platform has some sense of where a natural origin is (ie: The origin from the last orientation reset) to just use it instead, as the user has probably already specifically calibrated that to be at a comfortable place. Similarly, such systems will want to allow the local origin to follow the system origin when the user does an orientation reset (long press Oculus button, for example), and we don't want to do anything to prevent that.

If you get a session origin in wrong place, you can go back and restart the session in a proper position. But in other case, you do not have any other option but to restart the device.

idrisshah avatar Apr 03 '20 00:04 idrisshah

I still feel that having an option for the content developer to define the origin in some sort of reference space (may be only local reference space) can help in better user experience.

This is what happens on Oculus. The user hits a button and it will reset the device's coordinate space to where the user is currently looking. This is also the coordinate space given to WebXR. (I'm just stating this as an example. I'm not advocating that this should be put in the spec)

cabanier avatar Apr 03 '20 02:04 cabanier

If you get a session origin in wrong place, you can go back and restart the session in a proper position. But in other case, you do not have any other option but to restart the device.

@idrisshah please see what @toji said later:

In the meantime apps that want better control over their origins can always offer an in-app calibration screen (these are common in native apps) that instructs the user to, for example, find a natural position and press a button to start the experience. Internally the app can call getOffsetReferenceSpace() on the local reference space with the current pose (or the inverse of the current pose? I forget which) to retrieve a new reference space with that user-calibrated origin. This would actually be a really nice drop-in utility for the major frameworks to have.

cabanier avatar Apr 03 '20 02:04 cabanier

This is what happens on Oculus. The user hits a button and it will reset the device's coordinate space to where the user is currently looking.

Thats very nice behavior. I am wondering what happens to the existing windows after the button press.

idrisshah avatar Apr 03 '20 11:04 idrisshah

Everyone, is this worth a discussion in working group meetings ? Should we put it on agenda ? There are some excellent points here. IMHO since there is no guarantees for local coordinates across sessions and I believe there is no "standard" way of re-setting device origin (Oculus has a button, so does daydreamer). My personal experience with "local" has always been as both @idrisshah and @toji mentioned. Device is booting on my desk looking at opposite direction and the content is always behind me. It is a bit uncomfortable.

raviramachandra avatar Apr 03 '20 14:04 raviramachandra

/agenda to discuss and clarify behavior

raviramachandra avatar Apr 03 '20 16:04 raviramachandra

@cabanier:

For the case of placing furniture in a room, the coordinate system certainly won't change on HL either. If you start venturing out and the system relocalizes, yes you will reset to a new origin (which is is also the case with Oculus) but that is not the issue that @idrisshah wants to resolve.

On HoloLens, there is no persistent origin associated with any given room you walk into. If you want stuff to stay somewhere, you make an anchor for it and render holograms at that anchor. The world-scale experience section of our Coordinate systems article explains why that's the case. While our desktop VR headsets do support a user-defined room-scale "bounded" space, that does not apply to HoloLens where users do not predefine their space before using the device.

My assumption around the "local" reference space in both WebXR and OpenXR has been that UAs/runtimes would mostly define it in one of the following ways:

  • The device's initial head pose at app session startup (not device boot). This is what HoloLens does.
  • A user-chosen seated zero position that can be recalibrated with some system gesture, if that is the convention on the current platform for where to start seated/standing experiences. This is what Oculus Quest does.

Specifically, it seems incorrect to ever use device boot position/orientation as the definition of "local" space. That is not even possible for an app running on HoloLens - we don't expose it, as it is not meaningful to apps. In a WebXR context, it could even introduce privacy concerns, as it would tell you how far the user has walked since booting up the device this morning, something that is irrelevant to some furniture placing app that I'm using. It also violates the spirit of "local" space, which is intended for experiences where you don't even walk around (if you walk around on an AR device, you should be using "unbounded" space) - it's not very local to use the position I was in hours ago, as that could be miles away.

I would be in favor of spec language that encourages both of the definitions above, with UAs choosing between them based on the conventions of the device they're running on, and explicitly discourage inheriting any platform behavior around the user's device boot position or orientation, both for functional and privacy reasons.

thetuvix avatar Apr 07 '20 16:04 thetuvix

Adding to the above, current behavior for AR in Chrome for Android is to create local space roughly where the user's device was at WebXR session startup. Whatever we end up doing, I do not think that applications should be able to assume anything about local space - reasoning being that if there are multiple possible standard behaviors w/ no way to know which one was used, then there's no standard behavior so do not assume anything.

With that said, it seems to me that if the app requires any calibration, it would have to be handled by the application itself. Otherwise, the UAs will be forced to provide a way to calibrate the space for a given session, and, at least for immersive-ar in Chrome for Android (where there's nothing guaranteed about the environment), I do not immediately see a good way to achieve that.

bialpio avatar Apr 08 '20 00:04 bialpio

So far we have:

Magic Leap | Device boot | Not yet supported: Reset origin via some gesture or button
DayDream | Device boot | Reset origin via button Hololens | App Session | Not sure AR on Chrome | App Session (roughly) | Not Sure Oculus | Device boot | Reset origin via button

We are already doing different things on different devices. Looks like if the Apps want consistent behavior they need their own calibration. Our suggestion is that we "recommend" that UAs/platforms place origin at the session start at the least for "local" refspaces in the spec (Non - Normative). But we caution that: implementors may not guarantee that.

It might help app developers manage the inconsistency that already exists across devices if we provide a recommendation in the specification

raviramachandra avatar Apr 08 '20 17:04 raviramachandra

Servo on Hololens uses OpenXR, and I believe the OpenXR runtime sets LOCAL spaces to the head position at session start.

Manishearth avatar Apr 08 '20 20:04 Manishearth

We are already doing different things on different devices. Looks like if the Apps want consistent behavior they need their own calibration. Our suggestion is that we "recommend" that UAs/platforms place origin at the session start at the least for "local" refspaces in the spec (Non - Normative). But we caution that: implementors may not guarantee that.

It will be very surprising to authors and users that the origin resets itself each time an immersive WebXR session is created. I suspect some existing experiences will break if that change is made.

Instead, the origin could be set at browser startup time (like @thetuvix says Hololens does) or the current browser session.

cabanier avatar Apr 14 '20 15:04 cabanier

It will be very surprising to authors and users that the origin resets itself each time an immersive WebXR session is created. I suspect some existing experiences will break if that change is made.

Instead, the origin could be set at browser startup time (like @thetuvix says Hololens does) or the current browser session.

I agree. We do not need to reset the origin of the device every time while going immersive. but the origin for the current immersive session should account for the user direction in someway.

idrisshah avatar Apr 14 '20 16:04 idrisshah

@cabanier

It will be very surprising to authors and users that the origin resets itself each time an immersive WebXR session is created. I suspect some existing experiences will break if that change is made.

Instead, the origin could be set at browser startup time (like @thetuvix says Hololens does) or the current browser session.

Actually he said "at app session startup." Hololens 1, for example, completely resets that coordinate system presented to users each time a WebVR session is started.

I thought we had these issues hashed out a long time ago, TBH, when we started talking about the bounded and unbounded reference frames, and similar. There is literally no robust way to ensure the origin of a device that can be used over unbounded distances will maintain it's origin between multiple app sessions:

  • I put on my AR HMD. I start a web browser and go to a webxr page.
  • I enter XR (start a session) and do some stuff
  • I exit the session, get in my car and drive to the grocery store, wearing my device
  • I open the web browser again, and on the same page, enter XR (start a session)

(yes, I understand current AR HMDs will not work over such a large range, and have primitive shells that will leave the web browser pages coupled to physical locations. Both of these limitations will change).

By definition, the coordinate system presented by WebXR in an unbounded device MAY change between sessions; thus, even if the above devices CHOSE to reuse the coordinate frames under certain circumstances, they can't guarantee it. So apps shouldn't assume it.

@idrisshah

but the origin for the current immersive session should account for the user direction in someway.

The devil is in the details, though. When I start an immersive AR session on my phone or tablet, I'm often looking down, or have thing sitting on the desk. At the very least, it's very unlikely I'm looking in the direction that I'm "thinking" the experience should be in.

blairmacintyre avatar Apr 14 '20 16:04 blairmacintyre