OpenXR: Change timing of xrWaitFrame and fix XR multithreading issues
This PR changes the timing at which xrWaitFrame is called.
This also makes a start at ensuring the OpenXR code is thread safe if the renderer is run in a separate thread. It is important to note that much of the OpenXR data such as the instance and session are initialised before rendering commences and access should be safe. However there is also much data that can be overwritten during frame processing while the previous frame is rendering and I've only just started making that safe.
The change to xrWaitFrame should improve timing issues regardless of whether rendering is run in a separate thread or not.
For testing OpenXR with rendering happening in a separate thread, set this project settings:
Todos:
- [x] Fix xrWaitFrame timing
- [x] Fix thread issues with OpenXR rendering
- [ ] Fix thread issues with OpenXRAPI cleanup (possibly a non-issue due to application life cycle)
- [x] Fix thread issues with XRServer.world_origin
- [x] Fix thread issues with XRServer.world_scale
Note: This has only been tested on the Vulkan renderer, running multi-threaded in OpenGL may have issues unrelated to the work we're doing here.
- Fixes https://github.com/godotengine/godot/issues/83471
One of the issues I've run into is that call_on_render_thread requires a proper Godot Object to be used and OpenXRAPI is not such an object.
I may decide to turn OpenXRAPI into a proper Godot object. This may eventually make OpenXRAPIExtension obsolete.
Also there is a potential minor breakage in get_next_frame_time now actually returning the predicted display time of the next frame, not the current frame, now that we're properly handling xrWaitFrame.
It will print out a warning on a developer version if this is the case. A new vendors plugin will likely be needed once this is released. The fix is simply calling the new get_predicted_display_time function.
Set this to ready for review, I think I've got all issues resolved in core except for the cleanup part. Right now I don't think this is a problem as we'll stop the rendering thread before we hit this code, but potentially there needs to be some more done there.
I also tested this on my Quest 3 and it seems to be working fine.
Small update here:
- Further testing has shown this resolves a number of issues including crashing of XR applications when multithreading is enabled
- The artifact issue we've discovered on Quest 3 is not resolved, even is more clear when multithreading is used. This issue does not happen on PCVR, Quest Pro nor HTC Elite XR headsets and might even be unique to Quest 3. My conclusion at this point is that the cause of this lies elsewhere and should not stop us from merging this work which is important.
- We should probably merge #89880 first and then spend some time refactoring some of the swapchain logic as this clashes with changes here and in #87466
Waiting on #87466 to be approved before rebasing and fixing this as to not double up on work.
Ok, with composition layers and fixes to target size multiplier all merged, rebased this all and made some more fixes.
This is pretty much in good shape, could use some feedback on cleanup.
Thanks!
I skimmed the latest version of the code, and it seems good, but I haven't had a chance to really look at it deeply. I'll try to find some time to do that later.
I tested this on the Meta Quest 3 using the 'godot_openxr_vendors' demo, and the very simplest OpenXR demo that I have (just cube hands and a cube a couple meters in front of you, no materials).
With the OpenGL Compatibility renderer: Using a "Thread Model" of "Single-Safe", everything seems to work great! However, I'm still experiencing problems when using a "Thread Model" of "Multi-Threaded".
On the 'godot_openxr_vendors' demo, I'm getting a segfault (dereferencing null pointer) at startup. (I still haven't figured out why I'm not getting debug symbols in the backtrace, unfortunately.)
So, I decided to try the simplest possible demo, which doesn't crash, but I end up with just a black display. In the log, I see:
04-10 09:58:18.316 26574 26609 I godot : Godot Engine v4.3.dev.custom_build.2c887b92e (2024-04-10 06:46:18 UTC) - https://godotengine.org
04-10 09:58:18.607 26574 26609 I godot : OpenXR: Running on OpenXR runtime: Oculus 63.399.0
04-10 09:58:18.623 26574 26609 E godot : USER WARNING: The Multi-Threaded rendering thread model is experimental, and has known issues which can lead to project crashes. Use the Single-Safe option in the project settings instead.
04-10 09:58:18.623 26574 26609 E godot : at: setup2 (main/main.cpp:2720)
04-10 09:58:18.661 26574 26609 E godot : USER WARNING: Rendering thread not supported by this display server.
04-10 09:58:18.661 26574 26609 E godot : at: release_rendering_thread (servers/display_server.cpp:697)
04-10 09:58:18.661 26574 26631 E godot : USER WARNING: Rendering thread not supported by this display server.
04-10 09:58:18.661 26574 26631 E godot : at: make_rendering_thread (servers/display_server.cpp:701)
04-10 09:58:18.661 26574 26631 I godot : OpenGL API - Compatibility - Using Device: -
04-10 09:58:18.692 26574 26609 I godot :
04-10 09:58:18.806 26574 26609 I godot : OpenXR: Requested OpenGL version exceeds the maximum version this runtime has been tested on and is known to support.
04-10 09:58:18.806 26574 26609 I godot : - desired_version 3.3.0
04-10 09:58:18.806 26574 26609 I godot : - minApiVersionSupported 3.0.0
04-10 09:58:18.806 26574 26609 I godot : - maxApiVersionSupported 3.2.0
04-10 09:58:19.281 26574 26631 E godot : USER ERROR: Condition "!texture_allocs_cache.has(p_id)" is true.
04-10 09:58:19.281 26574 26631 E godot : at: texture_free_data (./drivers/gles3/storage/utilities.h:144)
04-10 09:58:19.282 26574 26631 E godot : USER WARNING: Could not create render target, status: 0
04-10 09:58:19.282 26574 26631 E godot : at: _update_render_target (drivers/gles3/storage/texture_storage.cpp:2104)
04-10 09:58:19.303 26574 26609 E godot : USER ERROR: Condition "!rendering_server->is_on_render_thread()" is true. Returning: nullptr
04-10 09:58:19.303 26574 26609 E godot : at: get_color_swapchain (modules/openxr/openxr_api.cpp:2180)
04-10 09:58:19.303 26574 26609 I godot : OpenXR: Unable to update the swapchain [ XR_ERROR_HANDLE_INVALID ]
04-10 09:58:19.313 26574 26631 E godot : USER WARNING: Could not create texture atlas, status: 0
04-10 09:58:19.313 26574 26631 E godot : at: update_texture_atlas (drivers/gles3/storage/texture_storage.cpp:1903)
04-10 09:58:19.316 26574 26631 E godot : USER ERROR: Condition "!texture_allocs_cache.has(p_id)" is true.
04-10 09:58:19.316 26574 26631 E godot : at: texture_free_data (./drivers/gles3/storage/utilities.h:144)
04-10 09:58:19.316 26574 26631 E godot : USER WARNING: Could not create render target, status: 0
04-10 09:58:19.316 26574 26631 E godot : at: _update_render_target (drivers/gles3/storage/texture_storage.cpp:2104)
04-10 09:58:19.316 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
[PRINT OUT OF THE scene.glsl SHADER, DUE TO IT FAILING VERTEX SHADER COMPILATION]
04-10 09:58:19.332 26574 26631 E godot : USER ERROR: SceneShaderGLES3: Vertex shader compilation failed:
04-10 09:58:19.332 26574 26631 E godot :
04-10 09:58:19.332 26574 26631 E godot : at: _display_error_with_code (drivers/gles3/shader_gles3.cpp:253)
04-10 09:58:19.332 26574 26631 E godot : USER ERROR: Method/function failed.
04-10 09:58:19.332 26574 26631 E godot : at: _compile_specialization (drivers/gles3/shader_gles3.cpp:347)
04-10 09:58:19.332 26574 26631 E godot : USER WARNING: shader failed to compile, unable to bind shader.
04-10 09:58:19.332 26574 26631 E godot : at: _version_bind_shader (./drivers/gles3/shader_gles3.h:222)
04-10 09:58:19.344 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
04-10 09:58:19.596 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
04-10 09:58:20.137 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
04-10 09:58:20.345 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
04-10 09:58:21.013 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
04-10 09:58:21.569 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
04-10 09:58:21.862 26574 26631 I godot : OpenXR: No swapchain could be acquired to render to!
This error seems to come from the OpenXRFBFoveationExtension so it probably needs some updates for multithreading:
04-10 09:58:19.303 26574 26609 E godot : USER ERROR: Condition "!rendering_server->is_on_render_thread()" is true. Returning: nullptr
04-10 09:58:19.303 26574 26609 E godot : at: get_color_swapchain (modules/openxr/openxr_api.cpp:2180)
I'm not sure about the rest, but it seems like there may just be issues with using the OpenGL Compatibility renderer multi-threaded in general? Should this even be working, or are these failures expected?
I have better luck when using the Vulkan Mobile renderer.
Both my test projects "work", but with LOTS of flickering (seems similar to the Quest 3 issue we've been having for a while, but much, much worse, it's like every 3rd frame is black), and this message is spammed to the console:
OpenXR: No swapchain could be acquired to render to!
It seems like the swapchain isn't being acquired on some frames for some reason?
When I try the same demo, still with the Vulkan Mobile renderer, but with the "Thread Model" changed back to "Single-Safe", then I don't get any flickering at all, nor do I see that message in the logs at all.
So, it does seem like the flickering and issues with acquiring the swapchain are related to using multi-threaded rendering in this case, and isn't just the same Quest 3 problem that we've been encountering occasionally.
Anyway, please let me know how much of this is intended to work, and how much is still expected to fail! This PR seems to work fine with "Single-Safe" (so no noticeable regressions), and we didn't work with "Multi-Threaded" at all before this PR anyway (in my previous experience, it would just crash) so any improvement there is good :-)
I've done some further investigation on the flickering issue with Vulkan Mobile and a "Thread Model" of "Multi-Threaded".
First of all, the issue is NOT limited to Quest 3!
I've now tested on Quest 2, Quest Pro and Quest 3, and they all exhibit the same flickering.
Through adding a bunch of print_line()'s to the code, I've been able to determine that the swapchain isn't being acquired because OpenXRAPI::can_render() returns false, and that's because frame_state.shouldRender is false, which is set by xrWaitFrame().
So, for some reason, like every 2nd/3rd frame, xrWaitFrame() is setting shouldRender to false.
I wonder if there is some sort of synchronization issue between the main thread (which calls xrWaitFrame()) and the render thread (which calls xrBeginFrame() and xrEndFrame())? Perhaps we're calling xrWaitFrame() too soon relative to what the render thread is doing, causing the OpenXR runtime to try and tell us to slow down?
@dsnopek For the record, see #90268, multi-threaded rendering being broken is somewhat expected for now, and that PR aims to improve it. Might be worth testing both PRs together.
Something potentially interesting from adb logcat which I get when testing on both Quest 2 and Quest 3 (I didn't feel like retesting Quest Pro):
04-10 11:26:08.669 7908 7992 I JniUtils-inl: Creating temporary JNIEnv. This is a heavy operation and should be infrequent. To optimize, use JNI AttachCurrentThread on calling thread
This is spammed to the log once for each time that I see OpenXR: No swapchain could be acquired to render to! in the log.
I can tell by the PID and thread ID (7908 and 7992) that this is coming from Godot's render thread, because I see the same values on my print_line()'s that I know are on the render thread.
This could be a red herring, but maybe there's something we need to do to allow Android to efficiently use a separate render thread?
@m4gr3d Does this log message mean anything to you?
@akien-mga Thanks for the suggestion!
I tried patching with the changes from PR https://github.com/godotengine/godot/pull/90268 (applied cleanly with some small offsets), and unfortunately, it doesn't fix the flickering. :-(
Thanks for all the testing David, I've only done limited tests on stand alone headsets so this is good feedback. I've also been mostly focusing on Vulkan so indeed, possibly we need more testing on OpenGL. So let me find some time to look more deeply into OpenGL, I might have missed some code that needs to be made safe.
Also in that same line of thought, we've not looked at thread safety in the vendor plugin, it is possible that there too we're doing things on the main thread that should be executed on the rendering thread, or accessing things on the rendering thread that is being modified by the main thread. In my testing I was using none of the vendor plugin functionality, but using the vendor plugin demo would immediately cause issues.
xrWaitFrame returning shouldRender as false happens when the XR runtime determines it can't use that frame and we should skip rendering that. In what scenarios this is the case is still very vague and since we're not designed as an XR only render engine, we just skip rendering the HMD output and start on the next frame. These are the sort of scenarios we really need feedback from hardware vendors to tell us what's causing this, whether we're doing something wrong here, or whether its an issue on their end.
Theorising here, seeing you're using the vendor plugin demo which is really really simple and does almost nothing on the main thread, it might be that we're processing frame after frame after frame, while we're still rendering the first frame. It will eventually hit a point where all images in the swapchain have been written to and we can no longer acquire the next.
Now xrWaitFrame is supposed to throttle this, it's supposed to pause the main thread so that handling that frame (process + render) is done as close as possible to presenting that frame. But it needs timing info for that which it doesn't have at the beginning, and things might snowball from there.
That would explain why I didn't run into this because I'm using my https://github.com/BastiaanOlij/godot_openxr_demo project which doesn't do a whole lot more, but does do more including physics processing. It may slow things down enough for it not to run into this issue. Or maybe it is, and that is explaining some of the issues I've had on Quest 3 but which I also have in safe mode.
All in all, this is pure speculation. I agree with Akien we should include any fixes in threading that we've already found in the rendering engine itself, and then see what the combination does. And if it still exhibits issues, we need to start talking to hardware vendors like Meta, Pico and HTC and see if they are willing to put resources into this and see why the XR Runtime is behaving the way it is.
I think it's pretty clear that this one is not going to be ready for 4.3, but we definitely need to see if we can get this over the finish line for 4.4
I think it's pretty clear that this one is not going to be ready for 4.3, but we definitely need to see if we can get this over the finish line for 4.4
Well, as far as we can tell this doesn't cause any regressions when using "Single-Safe". And it works somewhat better with "Multi-Threaded": we get flickering rather than an outright crash. So, I could see an argument for still going forward with this, and continuing to fix "Multi-Threaded" in future PRs?
It would be amazing to get to the bottom of this, though - my brain has gotten a little obsessed with the issue :-)
For reference, https://github.com/godotengine/godot/pull/90268 is a step to getting multithreaded rendering working properly. But there is still a lot more work needed. https://github.com/godotengine/godot/pull/87590 also helps, but I would consider these two PRs prerequisites to work on fixing the existing issues rather than solutions themselves.
@BastiaanOlij @dsnopek Regarding shouldRender, the simplest way to handle that is to ignore it. As the example here, shouldRender should not impact the OpenXR frameloop. App should still call the xrWait/Begin/EndFrame() as usually. The app can also continue to use xrAcquire/Wait/ReleaseSwapchainImage() to lock the swapchain images.
The spec says
shouldRender is XR_TRUE if the application should render its layers as normal and submit them to xrEndFrame. When this value is XR_FALSE, the application should avoid heavy GPU work where possible, for example by skipping layer rendering and then omitting those layers when calling xrEndFrame.
If the app ignore the shouldRender flag, the side effect would be unnecessarily rendering the contents even the app is invislble, and waste performance on that. However, if it's suspicious there might be a mishandling of shouldRender, ignoring it would be the easily way to be sure.
@BastiaanOlij @dsnopek Regarding
shouldRender, the simplest way to handle that is to ignore it. As the example here,shouldRendershould not impact the OpenXR frameloop. App should still call thexrWait/Begin/EndFrame()as usually. The app can also continue to usexrAcquire/Wait/ReleaseSwapchainImage()to lock the swapchain images.The spec says
shouldRender is XR_TRUE if the application should render its layers as normal and submit them to xrEndFrame. When this value is XR_FALSE, the application should avoid heavy GPU work where possible, for example by skipping layer rendering and then omitting those layers when calling xrEndFrame.
If the app ignore the
shouldRenderflag, the side effect would be unnecessarily rendering the contents even the app is invislble, and waste performance on that. However, if it's suspicious there might be a mishandling ofshouldRender, ignoring it would be the easily way to be sure.
Should all work in that way, if shouldRender is false we simply don't submit any render result with xrEndFrame. Also Godot attempts to skip rendering the frame all together if it can. But we do call xrWaitFrame/xrBeginFrame and xrEndFrame regardless of the outcome of shouldRender (we didn't in the past but we fixed that a long time ago).
That said, what may be a problem, if shouldRender is true but we can't obtain the next image in the swap chain, we also skip rendering that frame and call xrEndFrame as if shouldRender was false.
That said, what may be a problem, if
shouldRenderis true but we can't obtain the next image in the swap chain, we also skip rendering that frame and callxrEndFrameas ifshouldRenderwas false.
Does "we can't obtain the next image in the swap chain" mean a failure from calling xrAcquireSwapchainImage? Other than XR_ERROR_CALL_ORDER_INVALID which implies an app issue, most of the error codes there would indicate a serious runtime issue, which honestly shouldn't happen and can't be safely handled by the app.
But there seems a serious glitch how Godot handles xrWaitSwapchainImage:
https://github.com/godotengine/godot/blob/7529c0bec597d70bc61975a82063bb5112ac8879/modules/openxr/openxr_api.cpp#L193-L211
The 17ms timeout value is too short and could trigger XR_TIMEOUT_EXPIRED quite often.
Note that in OpenXR, XR_TIMEOUT_EXPIRED is a success code, but will not pass XR_UNQUALIFIED_SUCCESS(). If that happens, the rendering will be skipped and xrAcquireSwapchainImage will be skipped in the next frame too. That could trigger a general synchronization issue. I am not sure if it could be a cause of the tearing, but it's definitely worth to be fixed.
Instead of passing XR_INFINITE_DURATION to xrWaitSwapchainImage which can potentially trigger a deadlock, using a loop with high timeout value would be preferred. Here is an example code:
bool WaitSwapchainImageWithRetry(XrSwapchain swapchain) {
const int waitTimeIncrementInNanoseconds = 1000000000; // 1 second timeout in nanoseconds
const int maxWaitRounds = 10;
int waitRounds = 0;
for (; waitRounds < maxWaitRounds; ++waitRounds) {
XrSwapchainImageWaitInfo waitInfo = {XR_TYPE_SWAPCHAIN_IMAGE_WAIT_INFO};
waitInfo.timeout = waitTimeIncrementInNanoseconds;
XrResult result = openxr_api->xrWaitSwapchainImage(swapchain, &waitInfo);
if (XR_FAILED(result)) {
// print an error
return false;
}
if (result != XR_TIMEOUT_EXPIRED) {
return true;
} else {
// print an warning
}
}
// print an error
return false;
}
@xwovr:
Thanks for the suggestions!
If the app ignore the
shouldRenderflag, the side effect would be unnecessarily rendering the contents even the app is invislble, and waste performance on that. However, if it's suspicious there might be a mishandling ofshouldRender, ignoring it would be the easily way to be sure.
I tried ignoring shouldRender with Quest 3 and multi-threaded Vulkan Mobile, and that fixes the flickering that I was seeing!
And even with that change, I don't notice low frame rate or other issues, even when moving my hands and head around very quickly.
I think this may have led me to figure out the actual issue with excessive flickering and multi-threaded rendering! (This is different from the more intermittent flickering we have been seeing only on Quest 3 with single threaded Vulkan, which I think is a separate issue.)
@BastiaanOlij I think this is the fix (or something like it):
diff --git a/modules/openxr/openxr_api.cpp b/modules/openxr/openxr_api.cpp
index 4ced4196d1..429ed8e320 100644
--- a/modules/openxr/openxr_api.cpp
+++ b/modules/openxr/openxr_api.cpp
@@ -2013,19 +2013,10 @@ bool OpenXRAPI::process() {
// retrieving tracking data and using that tracking data to render.
// OpenXR thus works best if rendering is performed on a separate thread.
XrFrameWaitInfo frame_wait_info = { XR_TYPE_FRAME_WAIT_INFO, nullptr };
- frame_state.predictedDisplayTime = 0;
- frame_state.predictedDisplayPeriod = 0;
- frame_state.shouldRender = false;
-
XrResult result = xrWaitFrame(session, &frame_wait_info, &frame_state);
if (XR_FAILED(result)) {
print_line("OpenXR: xrWaitFrame() was not successful [", get_error_string(result), "]");
- // reset just in case
- frame_state.predictedDisplayTime = 0;
- frame_state.predictedDisplayPeriod = 0;
- frame_state.shouldRender = false;
-
set_render_display_info(0, false);
return false;
We were resetting frame_state.shouldRender to false, before calling xrWaitFrame(), which seems to only be setting it to true after it's finished waiting. With the game loop and rendering happening on different threads, the render thread is seeing frame_state.shouldRender as false because we've reset it to false before xrWaitFrame() has had a chance to set it back to true. It's another one of those "looking at data from the next frame while trying to render the current frame" type issues.
Anyway, with the patch I posted above, the flicking goes away in multi-threaded rendering, but without having to go so far as completely ignoring shouldRender.
@xwovr I haven't had a chance to try your second suggestion related to xrWaitSwapchainImage() yet, but I'll experiment with that next :-)
@xwovr:
Instead of passing
XR_INFINITE_DURATIONtoxrWaitSwapchainImagewhich can potentially trigger a deadlock, using a loop with high timeout value would be preferred. Here is an example code...
I attempted to adapt your example code - this is the exact patch I used (against this PR):
diff --git a/modules/openxr/openxr_api.cpp b/modules/openxr/openxr_api.cpp
index 4ced4196d1..487d95cfa0 100644
--- a/modules/openxr/openxr_api.cpp
+++ b/modules/openxr/openxr_api.cpp
@@ -193,10 +193,17 @@ bool OpenXRAPI::OpenXRSwapChainInfo::acquire(bool &p_should_render) {
XrSwapchainImageWaitInfo swapchain_image_wait_info = {
XR_TYPE_SWAPCHAIN_IMAGE_WAIT_INFO, // type
nullptr, // next
- 17000000 // timeout in nanoseconds
+ 1000000000, // timeout = 1 second in nanoseconds
};
- result = openxr_api->xrWaitSwapchainImage(swapchain, &swapchain_image_wait_info);
+ for (int retry = 0; retry < 10; retry++) {
+ result = openxr_api->xrWaitSwapchainImage(swapchain, &swapchain_image_wait_info);
+ if (result != XR_TIMEOUT_EXPIRED) {
+ break;
+ }
+ WARN_PRINT("OpenXR: timed out waiting for swapchain image.");
+ }
+
if (!XR_UNQUALIFIED_SUCCESS(result)) {
// Make sure end_frame knows we need to submit an empty frame
p_should_render = false;
@@ -206,6 +213,7 @@ bool OpenXRAPI::OpenXRSwapChainInfo::acquire(bool &p_should_render) {
print_line("OpenXR: failed to wait for swapchain image [", openxr_api->get_error_string(result), "]");
return false;
} else {
+ WARN_PRINT("OpenXR: couldn't to wait for swapchain but not a complete error [" + openxr_api->get_error_string(result) + "]");
// Make sure to skip trying to acquire the swapchain image in the next frame
skip_acquire_swapchain = true;
return false;
(Note: I was only trying to integrate the suggestion with as few changes as possible - let me know if I missed an important part of your suggestion in my adaption.)
Unsurprisingly, these changes on their own didn't fix the "flickering with multi-threaded rendering" issue, since we already knew that was due to shouldRender being set to false when it shouldn't be - but I tried it anyway.
I also tried these changes with this demo project from Bastiaan, because for me it very reliably reproduces the intermittent flicker we've been seeing on Quest 3 with single-threaded Vulkan Mobile. It didn't have any effect on that either - I still experienced that issue.
So, while your suggestion may be a good idea in general, it doesn't seem to solve any of the problems we've been discussing here.
Unless @BastiaanOlij objects and wants to integrate it into this PR, I think we should probably make a new PR for your xrWaitSwapchainImage() suggestion, since it seems to be unrelated to these changes.
@dsnopek I am glad that you located the root cause of the flickering issue 💯 And yes, I believe the xrWaitSwapchainImage fix worths its own PR since it would fix other issues which would have different triggers.
@dsnopek no, that's not the correct fix. This is the whole reason this PR introduces the render_state struct. The rendering thread doesn't read from frame_state.shouldRender, it reads from render_state.should_render which is a copy of that value made before that frame gets rendered.
We do this for a lot of state where the main thread may change the value for the current frame being prepared, while the rendering thread should still be using the previous value while rendering the frame it's rendering (or even the value from before, the way this works is that the changes are added to a queue that is executed right before we render a frame).
So it should be completely safe to set frame_state.shouldRender to false. The code running in the render thread should never see that as we're currently rendering the previous frame. We don't start rendering this frame with the new values until we're finished with the current frame.
So this suggest we're either still using frame_state.shouldRender in a method executed on the rendering thread, or there is something wrong in the way the rendering thread determines whether it should start rendering a new frame.
I was going to have a meetup with @reduz to discuss more about the internal workings of the rendering thread setup, so I get a better understanding of that last bit.
@dsnopek so just to clarify, set_render_display_info(frame_state.predictedDisplayTime, frame_state.shouldRender); is the moment we copy the resulting values from the main thread to the render thread, and that call gets queued and won't actually be executed until the render thread is ready to render this frame.
So the code running on the render thread should remain blissfully unaware we set frame_state.shouldRender momentarily to false.
@BastiaanOlij:
So this suggest we're either still using
frame_state.shouldRenderin a method executed on the rendering thread, or there is something wrong in the way the rendering thread determines whether it should start rendering a new frame.
It is checking frame_state.shouldRender on the render thread.
In OpenXRAPI::pre_draw_viewport() (which runs on the render thread) there is:
if (!can_render()) {
return false;
}
... and the definition of OpenXRAPI::can_render() is:
_FORCE_INLINE_ bool can_render() { return instance != XR_NULL_HANDLE && session != XR_NULL_HANDLE && running && frame_state.shouldRender; }
What should we be doing here instead?
@BastiaanOlij What about this?
diff --git a/modules/openxr/openxr_api.cpp b/modules/openxr/openxr_api.cpp
index 4ced4196d1..150d7eb726 100644
--- a/modules/openxr/openxr_api.cpp
+++ b/modules/openxr/openxr_api.cpp
@@ -2156,7 +2156,7 @@ bool OpenXRAPI::pre_draw_viewport(RID p_render_target) {
// We found an XR viewport!
render_state.has_xr_viewport = true;
- if (!can_render()) {
+ if (instance == XR_NULL_HANDLE || session == XR_NULL_HANDLE || !running || !render_state.should_render) {
return false;
}
diff --git a/modules/openxr/openxr_api.h b/modules/openxr/openxr_api.h
index 5e811879be..dc2e7feaf7 100644
--- a/modules/openxr/openxr_api.h
+++ b/modules/openxr/openxr_api.h
@@ -445,7 +445,6 @@ public:
_FORCE_INLINE_ XrSpace get_play_space() const { return play_space; }
_FORCE_INLINE_ XrTime get_predicted_display_time() { return frame_state.predictedDisplayTime; }
_FORCE_INLINE_ XrTime get_next_frame_time() { return frame_state.predictedDisplayTime + frame_state.predictedDisplayPeriod; }
- _FORCE_INLINE_ bool can_render() { return instance != XR_NULL_HANDLE && session != XR_NULL_HANDLE && running && frame_state.shouldRender; }
XrHandTrackerEXT get_hand_tracker(int p_hand_index);
I haven't had a chance to test it, but it uses render_state.should_render instead of frame_state.shouldRender.
@xwovr , @dsnopek as to the xrWaitSwapchainImage fix, I think we should do that regardless. Note the use of XR_UNQUALIFIED_SUCCESS and then only fully erroring out on XR_FAILED and if so we skipped rendering the frame and just went to the next one. It makes far more sense to wait longer.
This rarely happens, the situation we very often ran into this causing issues was when you take a screenshot or start a recording, that is often where the Quest hangs onto the current image it's processing in the swapchain a little longer and we get a reason to have to wait before we can render to that image again.
But the 17000000 came from my misunderstanding many years ago about how this should work. Even just setting it to 1000000000 and then falling back to our current skip rendering logic would be enough. If we're waiting for more than a second for the next swapchain image to become available, we already have a big issue on our hands. But waiting for just 17ms is indeed waaaay too short.
Note also that this code is called from the rendering thread (if (!render_state.main_swapchains[i].acquire(render_state.should_render)) {) and is thus using render_state directly.
I haven't had a chance to test it, but it uses
render_state.should_renderinstead offrame_state.shouldRender
Good catch, I had a problem around there where I lost a bunch of work during a rebase, I had removed all calls to can_render and instead put the check in there so its absolutely clear what is being checked., I just left can_render for backwards compatibility for any code outside of the module that may still call it (mainly GDExtension).
Obviously that change got lost and this is indeed wrong. pre_draw_viewport is called from the rendering thread and should never access frame_state.shouldRender
(same for running btw, which equally has a companion in render_state)
Ok, rebased and applied both the swapchain fix and fixed up pre_draw_viewport. Also put a render thread guard onto can_render.
Haven't tested yet (still compiling) but just so @dsnopek can check this again as well.
@BastiaanOlij:
Ok, rebased and applied both the swapchain fix and fixed up
pre_draw_viewport. Also put a render thread guard ontocan_render.
Thanks! The latest is working great in my testing :-)
It looks like you still haven't addressed Clay's review above, but otherwise, this is a definite improvement, and I think worth moving forward with.
It looks like you still haven't addressed Clay's review above, but otherwise, this is a definite improvement, and I think worth moving forward with.
Seeing it looked like there was still too much going on for this to be merged into 4.3, I was giving some other things priority. For clay I think the only outstanding thing is the physics thing, I need to find a moment to have a closer look at that.
There are also still some questions needing answering on cleanup, we can accept how things work right now and I think it will be safe but still. Though we can also treat that as a future thing to improve if it actually turns out to be an issue.
Also I have had no chance at all to look into any issues with the compatibility renderer.
Also I have had no chance at all to look into any issues with the compatibility renderer.
Personally, I don't think we necessarily need to fix multi-threaded rendering with the compatibility renderer in this PR: it's broken before this PR, and stays broken after. So, I think it'd be OK to continue working on that in follow-up PRs.
However, single-threaded rendering with the compatibility renderer still works with this PR, and I think there's improvements here even for single-threaded rendering.