Feature request: Granular control to separate or disable audio influence within Quest Pro's visual face tracking (Natural Facial Expressions)
Is this a BUG REPORT or FEATURE REQUEST? Potentially both: Bug if existing mechanisms should prevent this, Feature Request if new logic is needed.
App version, VRCFaceTracking Module
- VRCFaceTracking Version: 5.2.3.0
- QuestProOpenXRTrackingModule Version: Latest
The Issue:
When using the Meta Quest Pro with "Natural Facial Expressions" (which provides data via XR_FACE_TRACKING_DATA_SOURCE2_VISUAL_FB according to Meta's OpenXR documentation), there is noticeable audio interference. Even when the user is silent, background noise or breathing can cause the avatar's mouth to move. These audio driven movements are seemingly blended into the visual data stream by Meta's API and are passed through VRCFaceTracking to VRChat as regular facial expressions (e.g., jawOpen, mouthPucker).
This makes it difficult to achieve true silent expressions or purely camera driven lip sync, as the application cannot easily distinguish these audio induced movements from intentional, camera tracked facial movements.
Meta's API Context:
Meta's documentation for XrFaceTrackingDataSource2FB states that XR_FACE_TRACKING_DATA_SOURCE2_VISUAL_FB "may also use audio to further improve the quality of the tracking." While this can be beneficial, its current implementation appears to introduce unwanted artifacts during user silence. The API also offers XR_FACE_TRACKING_DATA_SOURCE2_AUDIO_FB for purely audio driven expressions.
Feature Request/Desired Behavior: Could the VRCFaceTracking Quest Pro module investigate the following:
-
Current Handling of
XrFaceTrackingDataSource2FB: Is the module aware of this flag? If the data source isXR_FACE_TRACKING_DATA_SOURCE2_VISUAL_FB, is there any current logic to address the "optional audio" component, especially during periods of user silence? - Filtering/Mitigation Option: Would it be possible to introduce an option within VRCFaceTracking (perhaps a sensitivity threshold or a toggle) to aggressively filter or reduce the impact of minor lip movements when overall facial movement and voice activity are below a certain threshold, specifically when using the Quest Pro's visual tracking? This could help suppress the phantom audio movements.
-
Exposing Data Source Information: Could VRCFaceTracking potentially expose or log the reported
XrFaceTrackingDataSource2FBstate? This might help users and developers in diagnosing issues. -
Advocacy for Upstream API Improvement: If the Meta OpenXR API itself doesn't provide sufficient means to separate or disable this audio component within the
VISUAL_FBstream, could the VRCFaceTracking developers consider providing this feedback to Meta? More granular control from Meta's side would be the ideal solution.
Actual Behavior: When using Quest Pro with "Natural Facial Expressions" enabled via VRCFaceTracking, the avatar's mouth often moves in response to ambient sounds or breathing, even when the user is intentionally silent and making no facial expression. This is passed to VRChat as valid expression data.
Steps to Reproduce (User-Provided Example):
- Enable "Natural Facial Expressions" on Quest Pro.
- Use VRCFaceTracking
- Enter VRChat with a compatible avatar.
- Remain silent in a quiet environment, then introduce subtle background noise or vary breathing, without speaking or intentionally moving the mouth.
- Observe the avatar's mouth reacting to these sounds.
Environment:
- Hardware: Meta Quest Pro
- PCVR Connection Method: Virtual Desktop
- Operating System: Windows 11
If you are using Virtual Desktop you should be using the Virtual Desktop module, and this issue should be posted there, however this issue is unfortunately not easily amendable by any VRCFaceTracking module as the face tracking data provided by the Quest Pro will have the audio blending effect as it is on the system/API side.
Issue:
The behavior of the audio face tracking system is as a fallback; unfortunately the default behavior of mixing audio into the face tracking data is an intended feature of the visual facial tracking system on Quest Pro:
XR_FACE_TRACKING_DATA_SOURCE2_VISUAL_FB: This value indicates that the face tracking data source supports using inward facing camera data to estimate facial expression. The system may also use audio to further improve the quality of the tracking.
XR_FACE_TRACKING_DATA_SOURCE2_AUDIO_FB: This value indicates that the face tracking data source supports using audio data to estimate facial expression. The runtime must not use camera data for this data source.
To clarify what these two options mean, think of these two 'data sources' as options you can pick, you can pick either 'Audio + Visual' face tracking or just 'Audio' driven face tracking. The API does not do any fancy blending based on the properties you set for the face tracking data source, for all intents and purposes the 'Visual (+Audio)' data source will always be chosen if 'Expression' tracking is enabled on Quest Pro, otherwise it will fallback to using 'Audio' data source. This means that regardless of what options you pick from the API you will have audio blending enabled.
Since this is directly accessed from the singular API it would be up to Meta to change the behavior of visual face tracking to allow the user to disable audio-blended expressions in visual face tracking. Unfortunately at this time there is no such option available.
You can work around this by disabling Microphone permissions to the Virtual Desktop app but this would also disable all microphone usage for Virtual Desktop, including microphone passthrough for apps like VRChat. If you are using the Steam version of VRCFaceTracking, you can opt into the beta branch to be able to tweak the face tracking parameters to your liking as well under the new 'Tracking Settings' tab.
Reproduction
The steps to reproduce when picking an avatar could be more specific. Are you using an avatar that has visemes as a toggleable option or pre-baked as part of the avatar (or having yourself muted in VRChat to directly use face tracking data)? These may add unexpected expressions to avatars that may not completely match up to what you are doing with your face.
End
I hope this answers your issue and questions!
Hi @regzo2
Thanks for the clear explanation. It's helpful to understand the API limitations, truly a typical Meta move with the "my way or no way".
For reproduction, to clarify: I am muted in VRChat during testing. The observations are based purely on the direct face tracking data isolating the issue from VRChat's visemes and confirming the unwanted mouth movement originates from the supplied tracking data.
Since this is a Meta API behavior directly impacting users who rely on nuanced, silent expressions, and VRCFaceTracking serves many Quest Pro users in VRChat, would your team consider advocating to Meta for an API option to disable this audio blending in the visual stream? A true "visual only" mode, or more granular control over the audio component would significantly improve tracking precision and user experience.
I appreciate the information regarding the beta branch's tracking parameter adjustments and will explore it as a potential mitigation for some of the (forced) effects. 😄