godot icon indicating copy to clipboard operation
godot copied to clipboard

Finish splitting functionality of the `RenderingDevice` backends into `RenderingDeviceDriver`.

Open DarioSamo opened this issue 1 year ago • 27 comments

This is a follow up PR to @RandomShaper's PR https://github.com/godotengine/godot/pull/83452 with the intention of cleaning up the loose ends that weren't put behind an abstraction yet. This also includes the changes identified by @darksylinc's PR https://github.com/godotengine/godot/pull/80566 as well as the new project settings that allow configuration of the swap chain image count and the buffered frame count of RenderingDevice.

The main reason behind pushing for this change is that the additional flexibility provided by the PR is essential to providing new asynchronous transfer queues, which will lead to massive improvements in loading times (not included in this PR). This will also enable to provide additional multi-queue functionality in the future for the Acyclic Command Graph that was recently merged. However, the current design makes it impossible to implement this without making awkward changes to all backends and maintaining all of the backends at once.

Another important reason for this change is that this PR will finally enable the possibility of the RenderingDevice's local device working independently of the renderer that was chosen, which means the GPU lightmapper can finally work in Compatibility mode! @clayjohn has proven this to be possible in a separate experiment (not part of this PR but based on top of it) where he added the required UV2 rendering to the compatibility renderer.

The overall net result means a lot of the logic is now shared across backends and the only thing the backend needs to worry about is exposing the following functionality:

  • Command Queues
  • CPU/GPU Fences
  • GPU/GPU Semaphores
  • Swap Chains

The resulting code is also a lot easier to follow and it should be much simpler to fix some of the strange bugs that have been plaguing the backends for a while. However, I do not expect my first attempt at this to have been perfect, since swap chain logic can be particularly hard to handle correctly, so we'll need some extensive testing to confirm that this doesn't introduce any regressions.

NOTE: No performance improvements or regressions are expected from this PR, it is purely an organizational change and functionality wise it should be identical in the default case.

Confirm the changes work as intended in all the affected platforms.

  • [x] Windows
  • [x] Linux (X11 Systems)
  • [x] Linux (Wayland)
  • [x] Android
  • [x] macOS
  • [x] iOS
  • [x] OpenXR (Desktop)
  • [x] OpenXR (Quest)

TODO:

  • [x] CmdBeginDebugUtilsLabelEXT and related functions should be fetched like the older versions.
  • [x] Rendering Device should print the API name instead of "API".
  • [x] There should be no need to print the information about the picked device for RDs that aren't the Singleton.
  • [x] Fix all CI issues so it builds successfully in all platforms.
  • [x] Provide compatibility methods if necessary for RenderingDevice on the functions that have been changed.
  • [x] Finish documentation changes.

  • Production edit: This closes https://github.com/godotengine/godot/issues/87532.

DarioSamo avatar Jan 18 '24 14:01 DarioSamo

I've been reviewing it today, the Vulkan & generic parts look alright (I did not review D3D12).

As I already told Dario personally, the only thing that makes me nervous is this:

RenderingDeviceGraph::frame
RenderingDevice::frame

They're both cloned versions and there's no guarantee(*) they're in sync. But on the other hand, if we'd unify them into just RenderingDevice, we still have no guarantee RenderingDeviceGraph will notice when RenderingDevice::frame advances.

And what's missing would be to aggressively check:

DEV_ASSERT( RenderingDevice::frame == RenderingDeviceGraph::frame );

Nonetheless, as Dario explained to me and I've confirmed, RenderingDeviceGraph::frame isn't currently in use because that's only used for secondary command buffers, which had to be disabled because they were broken on NVIDIA (either because of NV errors, or because of Godot's faults).

(*) By "no guarantee" I mean if someone in the future touches this code and inadvertently introduces subtle bugs. Even if the current code is 100% bug free; checks need to be done to ensure it doesn't get broken tomorrow.

darksylinc avatar Jan 21 '24 23:01 darksylinc

Haven't had time to dive deep into this but making sure OpenXR still works I can confirm that:

  • PCVR works which suggests our OpenXR overrules are still working
  • Running on Quest this currently hard crashes but I haven't been able to capture good crash info. Needs to be tested whether this is an Android issue or specific to Android XR devices.

BastiaanOlij avatar Jan 22 '24 05:01 BastiaanOlij

It was indeed crashing in Android due to an implementation error on my part. Now I've checked and it seems to be working. Weirdly enough, it also seems to fix a bug where using the controls in the editor would stretch the contents of the viewports to be twice its scale.

DarioSamo avatar Jan 22 '24 15:01 DarioSamo

Weirdly enough, it also seems to fix a bug where using the controls in the editor would stretch the contents of the viewports to be twice its scale.

I've been seeing this issue, but I see it in the OpenGL backend as well. So there might be something deeper going on in addition.

clayjohn avatar Jan 22 '24 19:01 clayjohn

However, while this resolves #87532, every time I bake lightmaps, I now see this message in the Output panel:

API 1.3.260 - Forward+ - Using Device #0: NVIDIA - NVIDIA GeForce RTX 4090

It's missing "Vulkan" before "API", and I think there should be a way to silence it as it may be excessive for certain local RenderingDevices.

Noted, I'll implement these suggestions.

DarioSamo avatar Jan 24 '24 23:01 DarioSamo

Fantastic work on this!!

reduz avatar Jan 25 '24 12:01 reduz

Opening this up for review, I'll need some help in testing out the other platforms remaining in the TODO list as I don't own the hardware to verify this.

Some important points that should be tested:

  • Window resizing
  • Window minimization
  • Minimizing app
  • Changing used display

DarioSamo avatar Jan 25 '24 16:01 DarioSamo

@Mickeon Those were taken from @darksylinc's other PR with his permission so I'll have to refer to him to see if he agrees with the changes, as I did not write these originally.

DarioSamo avatar Jan 26 '24 11:01 DarioSamo

As currently written, this seems to partially fix #85547. The shadow flickering issue now only happens when:

  • Resizing the window
  • Holding Alt for the Alt-Tab widget

Maximizing and minimizing on W11 with the AMD driver work great. Resizing works fine with aformentioned issue.

HybridEidolon avatar Jan 26 '24 21:01 HybridEidolon

Rebased to integrate the latest changes to D3D12.

Still looking for confirmation for Apple devices from anyone who owns them!

DarioSamo avatar Jan 29 '24 14:01 DarioSamo

@DarioSamo I built it on my M1 Macbook Air (only for x86_64, for some reason arm64 isn't able to find mvk symbols) and it works fine in a basic scene with SDFGI enabled. Resizing, moving, fullscreen, all of that works great. Just shader compilation slowness.

Edit: Spoke too soon; minimizing to tray does appear to cause issues with re-renders, but I'm not sure if that's new to this branch or something that's always existed on macOS.

HybridEidolon avatar Jan 29 '24 16:01 HybridEidolon

Edit: Spoke too soon; minimizing to tray does appear to cause issues with re-renders, but I'm not sure if that's new to this branch or something that's always existed on macOS.

This is likely https://github.com/godotengine/godot/issues/87309.

Calinou avatar Jan 29 '24 17:01 Calinou

Edit: Spoke too soon; minimizing to tray does appear to cause issues with re-renders, but I'm not sure if that's new to this branch or something that's always existed on macOS.

If you can confirm it's not a regression from master then that's probably fine.

DarioSamo avatar Jan 29 '24 17:01 DarioSamo

It is exactly as the issue Calinou linked describes, so it's definitely unrelated to these changes.

HybridEidolon avatar Jan 29 '24 18:01 HybridEidolon

Still having problems running this on Quest: image

It looks like the injection of OpenXR code to initialise Vulkan is not happening correctly, strangely enough, only on Android, works correctly on PC

Text version:

23:13:16.075
RuntimeIPCServerMgr [RUNTIMEIPC]
HandleClientConnect SUCCESS: Client: org.godotengine.testxr:org.godotengine.testxr:13465:, Server: com.oculus.systemdriver:com.oculus.vrruntimeservice (HapticsServer)
23:13:16.075
RuntimeIPCClient [RUNTIMEIPC]
TryToGetIPCResourcesFromServer: RPC_CONNECTOR_CONNECT_TO_SERVER SUCCESS: Client: org.godotengine.testxr:org.godotengine.testxr:13465, Server: com.oculus.systemdriver:com.oculus.vrruntimeservice (HapticsServer)
23:13:16.075
RuntimeIPCClient [RUNTIMEIPC]
EnsureServerConnection: SUCCESS. Client: org.godotengine.testxr:org.godotengine.testxr:13465, Server: com.oculus.systemdriver:com.oculus.vrruntimeservice (HapticsServer)
23:13:16.075
RuntimeIPCClient [RUNTIMEIPC]
ConnectToServer: SUCCESS: Client: org.godotengine.testxr:org.godotengine.testxr:13465, Server: com.oculus.systemdriver:com.oculus.vrruntimeservice (HapticsServer)
23:13:16.086
godot
OpenXR: Running on OpenXR runtime: Oculus 60.214.0
23:13:16.099
godot
OpenXR: XrGraphicsRequirementsVulkan2KHR:
23:13:16.099
godot
- minApiVersionSupported: 1.0.0
23:13:16.099
godot
- maxApiVersionSupported: 1.2.0
23:13:16.106
BufferQueueProducer
[SurfaceView[org.godotengine.testxr/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:349900000001,api:0,p:-749755580,c:13465) connect: BufferQueue has been abandoned
23:13:16.106
godot
USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: SurfaceID()
23:13:16.106
godot
at: surface_create (platform\android\rendering_context_driver_vulkan_android.cpp:54)
23:13:16.106
godot
USER ERROR: Failed to create vulkan window.
23:13:16.106
godot
at: DisplayServerAndroid (platform\android\display_server_android.cpp:609)
23:13:16.106
godot
USER ERROR: Unable to create DisplayServer, all display drivers failed.
23:13:16.106
godot
Use "--headless" command line argument to run the engine in headless mode if this is desired (e.g. for continuous integration).
23:13:16.106
godot
at: setup2 (main\main.cpp:2492)
23:13:18.432
[SUI] ActivityManagerUtilsHelper
onForegroundActivitiesChanged: Instance: com.oculus.common.activitymanager.ActivityManagerUtilsHelper@ca4c182, Pid: 13465, Tid: 10161, foregroundActivities: true, foreground app: org.godotengine.testxr, previous app: com.oculus.os.clearactivity
23:13:18.432
[SUI] AnytimeUIPanelApp
Received ActivityStateChanges - org.godotengine.testxr
23:13:18.433
[SUI] ActivityManagerUtilsHelper
onForegroundActivitiesChanged: Instance: com.oculus.common.activitymanager.ActivityManagerUtilsHelper@d2d12d5, Pid: 13465, Tid: 10161, foregroundActivities: true, foreground app: org.godotengine.testxr, previous app: com.oculus.os.clearactivity
23:13:18.433
[SUI] AnytimeUIPanelApp
Received ActivityStateChanges - org.godotengine.testxr
23:13:18.436
FocusedClientsListener
onTopActivityChanged: Updated TopActivityChangeEvent - topActivityName: org.godotengine.testxr, immersiveAppPackageName: nullopt, isImmersiveAppTopActivity: false
23:13:18.438
ClientMgr
OnForegroundVrAppClient: uid -1 pid -1 hasImmersiveApp 0 isTopActivity 0 --> org.godotengine.testxr, show splash = 1
23:13:18.438
ClientMgr
StartAppTransition: app=org.godotengine.testxr
23:13:18.443
GuardianFocusListener
[GuardianFocusListener::onTopActivityChanged] updated TopActivityChangeEvent - topActivityName: org.godotengine.testxr, immersiveAppPackageName: nullopt, isImmersiveAppTopActivity: false
23:13:18.445
MRSVC
MIXEDREALITY: MrServiceController: VrFocusManager::Top Activity Changed, topActivity now is: 'org.godotengine.testxr', immersiveApp now is: ''.
23:13:18.447
ClientMgr
StartAppTransition: Loading dots over 'black' background. [app=org.godotengine.testxr]
23:13:18.455
OVRLibrary
null cursor received for query content://com.oculus.ocms.library/apps/org.godotengine.testxr
23:13:18.460
OVRLibrary
null cursor received for query content://com.oculus.ocms.library/apps/org.godotengine.testxr
23:13:18.471
[SUI] ActivityManagerUtilsHelper
onFocusedWindowChanged: Instance: com.oculus.common.activitymanager.ActivityManagerUtilsHelper@ca4c182, topWindowUid: 10161, focusedApp: org.godotengine.testxr, previousApp: com.oculus.vrshell
23:13:18.471
[SUI] AnytimeUIPanelApp
Received ActivityStateChanges - org.godotengine.testxr
23:13:18.471
[SUI] ActivityManagerUtilsHelper
onFocusedWindowChanged: Instance: com.oculus.common.activitymanager.ActivityManagerUtilsHelper@d2d12d5, topWindowUid: 10161, focusedApp: org.godotengine.testxr, previousApp: com.oculus.vrshell
23:13:18.472
[SUI] AnytimeUIPanelApp
Received ActivityStateChanges - org.godotengine.testxr
23:13:18.473
MRSVC
MIXEDREALITY: MrServiceController: VrFocusManager::Top Activity Changed, topActivity now is: 'org.godotengine.testxr', immersiveApp now is: ''.
23:13:18.483
OVRLibrary
null cursor received for query content://com.oculus.ocms.library/apps/org.godotengine.testxr
23:13:18.501
godot
USER WARNING: _start_success was false
23:13:18.501
godot
at: start (main\main.cpp:2956)

BastiaanOlij avatar Jan 30 '24 11:01 BastiaanOlij

Thanks to @BastiaanOlij we tracked down that it is not necessary to request the transfer bit for the main queue, as apparently it is optional to do so and the Quest does not define it despite requesting the graphics and compute bits. This confirms that OpenXR does work in Quest and so all of the platforms we were looking to verify are working.

DarioSamo avatar Jan 30 '24 14:01 DarioSamo

Thanks to @BastiaanOlij we tracked down that it is not necessary to request the transfer bit for the main queue, as apparently it is optional to do so and the Quest does not define it despite requesting the graphics and compute bits. This confirms that OpenXR does work in Quest and so all of the platforms we were looking to verify are working.

Such a weird problem, but glad we figured it out. This all works great imho

BastiaanOlij avatar Jan 31 '24 03:01 BastiaanOlij

Seems like I'll have to re-do my changes for Wayland, testing would be appreciated when that is done as well.

DarioSamo avatar Jan 31 '24 12:01 DarioSamo

@DarioSamo count me in!

Riteo avatar Jan 31 '24 12:01 Riteo

@Riteo I'm waiting for CI to confirm if what I wrote builds but I feel this should be enough for testing it out.

DarioSamo avatar Jan 31 '24 12:01 DarioSamo

Oof some recent merges just conflicted this PR. I'm still testing this, but this already needs a rebase xD

Riteo avatar Jan 31 '24 13:01 Riteo

Oof some recent merges just conflicted this PR. I'm still testing this, but this already needs a rebase xD

You mean like in the middle of the hour I was just working on this? 😱

Let me re-do it.

DarioSamo avatar Jan 31 '24 13:01 DarioSamo

@Riteo Alright, conflicts should be fixed now.

DarioSamo avatar Jan 31 '24 13:01 DarioSamo

I tried this branch briefly with the Wayland backend on my crappy laptop and I couldn't notice any particular artifacts on the Garage 42 demo (I'll test properly on my desktop later).

There's a segfault on exit though:

Thread 1 "godot.linuxbsd." received signal SIGILL, Illegal instruction.
0x000055555f098bae in LocalVector<RenderingDevice::Frame, unsigned int, false, true>::operator[] (this=0x7fffef3aac68, p_index=1) at ./core/templates/local_vector.h:161
161			CRASH_BAD_UNSIGNED_INDEX(p_index, count);
(gdb) bt
#0  0x000055555f098bae in LocalVector<RenderingDevice::Frame, unsigned int, false, true>::operator[](unsigned int) (this=0x7fffef3aac68, p_index=1) at ./core/templates/local_vector.h:161
#1  0x000055555efb3265 in RenderingDevice::_end_frame() (this=0x7fffef3aa020) at ./servers/rendering/rendering_device.cpp:4887
#2  0x000055555efb3a6c in RenderingDevice::_flush_and_stall_for_all_frames() (this=0x7fffef3aa020) at ./servers/rendering/rendering_device.cpp:4919
#3  0x000055555efb70e7 in RenderingDevice::finalize() (this=0x7fffef3aa020) at ./servers/rendering/rendering_device.cpp:5323
#4  0x000055555efd5746 in RenderingDevice::~RenderingDevice() (this=0x7fffef3aa020, __in_chrg=<optimized out>) at ./servers/rendering/rendering_device.cpp:6123
#5  0x000055555afc4b34 in memdelete<RenderingDevice>(RenderingDevice*) (p_class=0x7fffef3aa020) at ./core/os/memory.h:109
#6  0x000055555aff72ee in DisplayServerWayland::~DisplayServerWayland() (this=0x7ffff716b9a0, __in_chrg=<optimized out>) at platform/linuxbsd/wayland/display_server_wayland.cpp:1386
#7  0x000055555aff923c in memdelete<DisplayServer>(DisplayServer*) (p_class=0x7ffff716b9a0) at ./core/os/memory.h:109
#8  0x000055555b0257bc in finalize_display() () at main/main.cpp:325
#9  0x000055555b04232f in Main::cleanup(bool) (p_force=false) at main/main.cpp:4056
#10 0x000055555af802da in main(int, char**) (argc=3, argv=0x7fffffffe2e8) at platform/linuxbsd/godot_linuxbsd.cpp:76

Edit: sorry, I accidentally pressed CTRL+Enter :sweat_smile:

Riteo avatar Feb 01 '24 17:02 Riteo

I tried this branch briefly with the Wayland backend on my crappy laptop and I couldn't notice any particular artifacts on the Garage 42 demo (I'll test properly on my desktop later).

There's a segfault on exit though:

I think that's a good enough confirmation for me TBH, the segfault sounds unrelated to the platform and something I might not just be cleaning up properly, so I'll review it. Thanks for the test!

DarioSamo avatar Feb 02 '24 12:02 DarioSamo

I've finally run this on my juicy desktop with the Wayland backend and properly built with LTO.

Garage 42 still looks great:

a car rendering properly

(Ignore the FPS, this is related to a known issue, it's actually rendering at 60 FPS, see #87963)

I also tried the abandoned spaceship demo, and I can't see any sort of artifact either: a spaceship in a cave

Hope this helps! :D

Riteo avatar Feb 05 '24 02:02 Riteo

There's a segfault on exit though:

Thread 1 "godot.linuxbsd." received signal SIGILL, Illegal instruction.
0x000055555f098bae in LocalVector<RenderingDevice::Frame, unsigned int, false, true>::operator[] (this=0x7fffef3aac68, p_index=1) at ./core/templates/local_vector.h:161
161			CRASH_BAD_UNSIGNED_INDEX(p_index, count);
(gdb) bt
#0  0x000055555f098bae in LocalVector<RenderingDevice::Frame, unsigned int, false, true>::operator[](unsigned int) (this=0x7fffef3aac68, p_index=1) at ./core/templates/local_vector.h:161
#1  0x000055555efb3265 in RenderingDevice::_end_frame() (this=0x7fffef3aa020) at ./servers/rendering/rendering_device.cpp:4887
#2  0x000055555efb3a6c in RenderingDevice::_flush_and_stall_for_all_frames() (this=0x7fffef3aa020) at ./servers/rendering/rendering_device.cpp:4919
#3  0x000055555efb70e7 in RenderingDevice::finalize() (this=0x7fffef3aa020) at ./servers/rendering/rendering_device.cpp:5323
#4  0x000055555efd5746 in RenderingDevice::~RenderingDevice() (this=0x7fffef3aa020, __in_chrg=<optimized out>) at ./servers/rendering/rendering_device.cpp:6123
#5  0x000055555afc4b34 in memdelete<RenderingDevice>(RenderingDevice*) (p_class=0x7fffef3aa020) at ./core/os/memory.h:109
#6  0x000055555aff72ee in DisplayServerWayland::~DisplayServerWayland() (this=0x7ffff716b9a0, __in_chrg=<optimized out>) at platform/linuxbsd/wayland/display_server_wayland.cpp:1386
#7  0x000055555aff923c in memdelete<DisplayServer>(DisplayServer*) (p_class=0x7ffff716b9a0) at ./core/os/memory.h:109
#8  0x000055555b0257bc in finalize_display() () at main/main.cpp:325
#9  0x000055555b04232f in Main::cleanup(bool) (p_force=false) at main/main.cpp:4056
#10 0x000055555af802da in main(int, char**) (argc=3, argv=0x7fffffffe2e8) at platform/linuxbsd/godot_linuxbsd.cpp:76

Turns out this is actually related to the platform, but only because I added a call to "finalize()" by mistake twice to RenderingDevice when no other platform was doing it. My bad. Either way I've fixed RD to be resilient against finalizing twice and also made it so it's not called twice on Wayland.

DarioSamo avatar Feb 05 '24 12:02 DarioSamo

Rebased to fix conflicts.

DarioSamo avatar Feb 12 '24 13:02 DarioSamo

Thanks! Amazing work :tada:

akien-mga avatar Feb 12 '24 22:02 akien-mga

Seems like some change in this PR broke multi-threaded renderer (both compatibility and Vulkan) on macOS (see https://github.com/godotengine/godot/pull/90268#discussion_r1554267653), but I'm not what's exactly wrong.

bruvzg avatar Apr 12 '24 19:04 bruvzg