osu-framework icon indicating copy to clipboard operation
osu-framework copied to clipboard

Implement a "deferred" renderer

Open smoogipoo opened this issue 1 year ago • 0 comments

Prereqs:

  • [ ] https://github.com/ppy/osu-framework/pull/6189
  • [ ] https://github.com/ppy/osu-framework/pull/6187

This is something I've wanted to do for a while now, so I'm glad to finally get this out of the way and hopefully in a release in a somewhat experimental state for the time being. It's 100% isolated, so it should be an easy revert/delete if it doesn't go as planned.

Introduction

At its core, the idea is to record everything we need to do within a frame - all state changes, all vertices, all UBO data - and run through all of those "events" in a single pass at the end of the frame.

By doing so we can eliminate some hackery that's employed by VeldridRenderer such as different implementations of staging buffers for different scenarios (sometimes causing breakage), and haphazardly using multiple command buffers to split out buffer/texture updates from the main command buffer.
Furthermore, there's a hope that this can lead to further optimisations down the track, but there's significant work to be done before that's possible.

Expectations

Before going further, I want to set some expectations:

  • You should treat this as doing things correctly first, and improving performance second.
  • Generally slightly lower performance across the board, though may be better on D3D11.
  • Vulkan is stable on platforms that support it.
  • OpenGL doesn't allocate nearly as much as Veldrid-OpenGL.
    • OpenGL (Legacy) will continue to be preferred.

Deep dive

This effectively defines two passes:

  • The "draw" pass starts at Renderer.BeginFrame() and ends just before Renderer.FinishFrame(). It covers the DrawNode.Draw() recursion.
  • The "paint" pass starts and ends inside Renderer.FinishFrame().

The draw and paint passes are functionally air-gapped via subtypes of IRenderEvent.

ResourceAllocator is a sort of in-process heap that facilitates serialisation between the two passes. You give it data, it gives you references that can be used later. This is because we can't pass managed objects between the two passes.

EventList holds all the draw events emitted by the deferred classes (DeferredRenderer, DeferredShader, et. al.)

EventProcessor is the "paint" pass - it runs through all events in the EventList and does the necessary graphics operations.

UniformBufferManager and VertexManager hold all UBO and vertex data as a contiguous array. There are some specifics as to how each of these work:

  • Alignment is of utmost importance for UBOs.
  • UBOs have strict size requirements, so they're usually exposed as 65KiB "chunks". If binding incorrectly, D3D11 will silently ignore the bind request (not error or anything - ignore).
  • Vertices are a little bit more lenient, but we're using the new vertexIndexOffset parameter of GraphicsPipeline.DrawVertices() which requires them to at least be aligned to their own struct size.

Testing

Can be tested in-game by changing the renderer. Can be tested via OSU_GRAPHICS_RENDERER=deferred;OSU_GRAPHICS_SURFACE=x (you must provide both, otherwise it'll fallback to deferred+opengl).

  • GT 710: D3D11 corruption fixed. Vulkan still dies in gameplay (driver issue).
  • GTX 1650: Works.
  • ARC A380: System dies on more than just lazer (driver issue).
  • 9800 GT: Need to fiddle around with drivers - uses software rendering atm.
  • Apple M1: Works.
  • RX 5700XT: Works.
  • RTX 3070Ti: Works.
  • iOS: Works.

Future work

  • Improve alignment.
    • The performance of this all is dependent upon the serialisation & structures therein. I've implemented everything (e.g. EventList) without considering alignment (hence using Unsafe.CopyUnaligned()).
  • Reimplement the masking SSBO. This is one thing that couldn't be done efficiently before due to in-line buffer updates.
  • Avoid processing of DrawNodes that haven't changed during the "draw" pass.

smoogipoo avatar Feb 19 '24 08:02 smoogipoo