XRHandModelFactory: Fix Inverted Hands on the Apple Vision Pro
On the Apple Vision Pro, hand/controller indices are assigned by the order they appear, and unassigned when they disappear.
This can cause hands that initially appeared as one handedness to later appear as another handedness.
This causes the Mesh Hands to appear inverted, as the SkinnedMesh is contorted to the mirror bone structure.
This PR detects handedness flips and re-initializes those meshes when it occurs.
I'm including the same logic for the Primitive Hands for conceptual purity, though they are currently unaffected by this issue due to their simplistic appearance. I'd like to add slightly more flair to the humble primitive hands someday, but not in this PR...
Here is a video of the old behavior:
https://github.com/mrdoob/three.js/assets/174475/67771b82-4d67-48c1-8797-87f9928fa3ce
And here's a video of the fixed behavior:
https://github.com/mrdoob/three.js/assets/174475/af7991ae-87b4-4a6f-b957-085dc4e49ce7
I've also submitted a bug to Apple, but usually they're pretty stubborn about ensuring their browser stays broken.
If they do ever fix this behavior, then this PR becomes a no-op.
(Also the WebGPU Reflections E2E Test has a Stats window that stochastically appears in the corner, causing the test to succeed or fail randomly. Kind of annoying...)
/fyi @AdaRoseCannon
I saw the issue and I am tracking it internally. For this particular scenario Safari's behaviour here is spec compliant, this is a library bug. For the specifics of the resolution of this bug, it would be more efficient to keep the hand models around if the site requests hand-tracking and then assign them to the correct hand as needed, rather than just detecting this scenario and disposing of both models.
For the specifics of the resolution of this bug, it would be more efficient to keep the hand models around if the site requests hand-tracking and then assign them to the correct hand as needed, rather than just detecting this scenario and disposing of both models.
Agreed.
@zalo Could you give it a go?
@AdaRoseCannon @mrdoob I have a few options here...
- Option 1) Both hands can load both models (4 total), and switch as needed. This is the least extra code, but it creates 4 hands, which mean that children attachments of hand mesh hierarchies need to be duplicated between Hand1 and Hand2's hidden Left and Right Hand representations 😅
- Option 2) I can refactor the system to create a centralized pool of hand meshes that can be shared between all hands, or to let hands know about each other to steal each-other's hand meshes. We did this cleanly at Leap Motion/Ultraleap, but it would break the hierarchy of the current system and get conceptually messy.
- Option 3) I can refactor the system to further abstract the varying WebXR implementations so that references to an XRHand refer to the same physical hand across initializations.
I am tempted to do Option 3, since it would:
- Be more intuitive for developers to have references to hands remain constant
- Unify Controller and Hand behavior
- Allow existing apps (designed for Leap, Quest 2, and 3) relying on this existing behavior to work with the Apple Vision Pro with minimal adjustment
- Add the least additional API overhead to developers for keeping track of hands, since there is already an abstraction
For 3) are you suggesting requesting XRHand by the handedness "left" and "right" instead of 0 and 1?
If so that sounds sensible, I cannot forsee there ever being a situation with more than 2 "tracked-pointer" inputSources with hands.
Hmm, a Left/Right getter method is an interesting compromise, though it alone doesn’t help the existing pattern where apps get references to Hand0 and Hand1 once on startup (since left/right can’t be determined at startup due to etc.)
My intention for 3) is to have three.js’s WebXRManager swap the underlying motionController in each XRHand when the handedness/ID changes, since that is the level of abstraction that can do information sharing between different hands/controllers without conceptual hierarchy breaking.
Since three.js’s XRHand seems to have been designed for a static reference to the physical hand, this should solve the hand inversion issue and all future downstream related issues and it should let existing apps work transparently without new getters (though, I’d probably also add a Left/Right getter that works on startup for convenience, since it was the 90% use-case accessor we used at Leap Motion).
So we’d keep the list of hands, but each XRHand will always be a reference to the same handedness hand that it was when it was first fetched. 👍
Right now the when you request a controller you request it via an index which matches the input's position in the WebXR Input Sources list. I would be hesitant to break this relationship since it would introduce some confusion as to what you are requesting when you request a hand by it's index.
My current thinking is that the index will always be valid at the time the XRHand is requested, but the index will change transparently within the XRHand to preserve handedness (rather than the handedness changing transparently within the XRHand, as it is right now).
Choosing between the index or the handedness changing within an XRHand, I'd rather the index change (since it only corresponds to "the-order-in-which-the-hands-have-appeared-most-recently", which is less meaningful than "which physical hand it is")
Of course, the current index of the hand will be accessible as a field much like the handedness is now.
Letting developers pick by handedness would allow for persistent handedness whilst maintaining the index. I'm worried that if we try to force the appearance of the index staying the same it will cause complex issues in the future. For example if a device was to support one hand and one controller. In general when any property on an input source changes it is added and removed from inputSources list so there is no guarantees for the order of the list so I think there are two sensible options:
- Continue to preserve the index and update it to represent whatever it currently is hand/controller/nothing
- Change to a model where you request for a particular space from an input picked by features such as it's handedness, it's targetRayMode and the presence of ".hand" with support for multiple matching. Which would be a complex interface but could be quite robust.
Hmm, the second one is interesting… like the request is made with a dictionary of fields to keep matched? If multiple controllers match a condition, I guess it would just pick the first one?
What are the benefits of holding constant properties other than handedness and controller index?
Mainly controller index isn't that useful as it may change in unreliable ways.
Getting the nth of each handedness is better, i.e. 0th left or 1st none.
But there are some cases where handedness may be the same but they need to be handled differently:
- "screen" type inputs differently from tracked-pointers with
handedness:"none" - "transient-pointer" inputs from "tracked-pointer" inputs
- tracked-pointer inputs with and without hands
Sorry, I've lost easy access to an Apple Vision Pro and I don't foresee having the time to work on this PR anymore :-(