osu-framework
osu-framework copied to clipboard
Implement a "deferred" renderer
Prereqs:
- [ ] https://github.com/ppy/osu-framework/pull/6189
- [ ] https://github.com/ppy/osu-framework/pull/6187
This is something I've wanted to do for a while now, so I'm glad to finally get this out of the way and hopefully in a release in a somewhat experimental state for the time being. It's 100% isolated, so it should be an easy revert/delete if it doesn't go as planned.
Introduction
At its core, the idea is to record everything we need to do within a frame - all state changes, all vertices, all UBO data - and run through all of those "events" in a single pass at the end of the frame.
By doing so we can eliminate some hackery that's employed by VeldridRenderer such as different implementations of staging buffers for different scenarios (sometimes causing breakage), and haphazardly using multiple command buffers to split out buffer/texture updates from the main command buffer.
Furthermore, there's a hope that this can lead to further optimisations down the track, but there's significant work to be done before that's possible.
Expectations
Before going further, I want to set some expectations:
- You should treat this as doing things correctly first, and improving performance second.
- Generally slightly lower performance across the board, though may be better on D3D11.
- Vulkan is stable on platforms that support it.
- OpenGL doesn't allocate nearly as much as Veldrid-OpenGL.
OpenGL (Legacy)will continue to be preferred.
Deep dive
This effectively defines two passes:
- The "draw" pass starts at
Renderer.BeginFrame()and ends just beforeRenderer.FinishFrame(). It covers theDrawNode.Draw()recursion. - The "paint" pass starts and ends inside
Renderer.FinishFrame().
The draw and paint passes are functionally air-gapped via subtypes of IRenderEvent.
ResourceAllocator is a sort of in-process heap that facilitates serialisation between the two passes. You give it data, it gives you references that can be used later. This is because we can't pass managed objects between the two passes.
EventList holds all the draw events emitted by the deferred classes (DeferredRenderer, DeferredShader, et. al.)
EventProcessor is the "paint" pass - it runs through all events in the EventList and does the necessary graphics operations.
UniformBufferManager and VertexManager hold all UBO and vertex data as a contiguous array. There are some specifics as to how each of these work:
- Alignment is of utmost importance for UBOs.
- UBOs have strict size requirements, so they're usually exposed as 65KiB "chunks". If binding incorrectly, D3D11 will silently ignore the bind request (not error or anything - ignore).
- Vertices are a little bit more lenient, but we're using the new
vertexIndexOffsetparameter ofGraphicsPipeline.DrawVertices()which requires them to at least be aligned to their own struct size.
Testing
Can be tested in-game by changing the renderer.
Can be tested via OSU_GRAPHICS_RENDERER=deferred;OSU_GRAPHICS_SURFACE=x (you must provide both, otherwise it'll fallback to deferred+opengl).
- GT 710: D3D11 corruption fixed. Vulkan still dies in gameplay (driver issue).
- GTX 1650: Works.
- ARC A380: System dies on more than just lazer (driver issue).
- 9800 GT: Need to fiddle around with drivers - uses software rendering atm.
- Apple M1: Works.
- RX 5700XT: Works.
- RTX 3070Ti: Works.
- iOS: Works.
Future work
- Improve alignment.
- The performance of this all is dependent upon the serialisation & structures therein. I've implemented everything (e.g.
EventList) without considering alignment (hence usingUnsafe.CopyUnaligned()).
- The performance of this all is dependent upon the serialisation & structures therein. I've implemented everything (e.g.
- Reimplement the masking SSBO. This is one thing that couldn't be done efficiently before due to in-line buffer updates.
- Avoid processing of
DrawNodes that haven't changed during the "draw" pass.