osu-framework
osu-framework copied to clipboard
Remove local vertex storage from VertexBuffer
This is an RFC. I will rebase and clean up this PR so please don't attempt to go through by-commit
@peppy @bdach @frenzibyte This PR is in a "final" state for me, but I want your opinions and approvals on the structure of things and my proposed ideas before I go ahead with rebasing and cleaning it up. I've attempted to explain things (minus making a breaking changes section) so let me know if something doesn't make sense.
Some things like the condition added in PathDrawNode can/will be split out into separate PRs.
Closes https://github.com/ppy/osu-framework/issues/4228
Abstract
VertexBuffer currently houses CPU-side storage for all added vertices in order to de-dupe vertex uploads. This is inefficient in multiple regards:
- Permanently increases memory footprint.
- Possibly causes stutters as a result of the ImageSharp array pool allocating finalisable objects.
- Increases processing time for vertices. Every vertex addition, new or not, compares equality.
But we have a mechanism to avoid this - Drawable invalidation. We know exactly when Drawables change and when DrawNodes are expected to draw new vertices as a result, which can be used to relax our requirement for vertex equality and remove the CPU-side storage.
Batching
Batching requires constant maintenance to improve, and our current batching mechanisms are starting to show their age. There is a lot that can be done to improve batching, such as having an allocator with more knowledge over the structure of the scene graph to decide which vertices to batch together.
DrawNode should eventually not be able to create their own batches. Any methods of creating custom batches should be seen as temporary pending further implementation of such an allocator, at which point they'll be replaced by a lower-level solution.
IRenderer
To cover the above changes to batching and future work in eventually removing GLWrapper entirety, the new IRenderer becomes the single new parameter of DrawNode.Draw():
+ public interface IRenderer
+ {
+ // Will eventually replace GLWrapper.
+ }
+
+ public class OpenGLRenderer : IRenderer
+ {
+ }
public class DrawNode
{
- public virtual void Draw(Action<TexturedVertex2D> vertexAction)
+ public virtual void Draw(IRenderer renderer)
{
// Actual drawing commands.
}
}
For the time being, IRenderer.BeginQuads() is exposed to interact with the parenting CompositeDrawable's/global quad batch.
VertexGroup<T>/VertexGroupUsage<T>
VertexGroup<T> is the new object by which a DrawNode "batches" vertices together. It stores metadata about vertices such as invalidation state, but does not store the vertices themselves.
VertexGroupUsage<T> denotes a usage of a VertexGroup<T>, allowing the user to "add" vertices to the group. This is a stack-only object and thus cannot be stored. This object internally optimises for the case where new vertices don't need to be uploaded.
Drawing
Combining all the above, a DrawNode undergoes the following process to draw to the screen:
public class CustomDrawNode : DrawNode
{
// Create one or more VertexGroups to draw.
private readonly VertexGroup<TexturedVertex2D> vertices = new VertexGroup<TexturedVertex2D>();
public override void Draw(IRenderer renderer)
{
// Set up blending.
base.Draw(renderer);
// Setup shaders, textures, masking, etc.
shader.Bind();
texture.Bind();
// Start a new usage of the VertexGroup.
using (VertexGroupUsage<TexturedVertex2D> usage = renderer.BeginQuads(this, vertices))
{
// When it is known how many vertices are to be drawn ahead of time,
// preventing construction of vertices or other superfluous operations
// will yield small performance gains.
//
// This can be applied on a case-by-case basis where it is easy to do so,
// otherwise Add() internally optimises for individual vertex additions.
if (!usage.TrySkip(4))
return;
usage.Add(...);
usage.Add(...);
usage.Add(...);
usage.Add(...);
}
}
}
Important rules
- A
DrawNodemust not change its vertices between frames unless theDrawablehas been invalidated.- The new vertex values will not be rendered until the
Drawableis invalidated. - The framework will not throw an exception if this contract is violated except if compiled in
DEBUG.
- The new vertex values will not be rendered until the
- A
VertexGroup<T>can not be used multiple times in a single frame.- The framework will throw an exception if this contract is violated.
VertexGroupUsage<T>s can not be nested.- The framework will throw an exception if this contract is violated.
Custom batches
The IRenderer only provides a method to draw quads. For the time being, DrawNodes can create their own vertex batches if they need to and use them much like IRenderer.BeginQuads():
public class CustomDrawNode : DrawNode
{
private readonly LinearBatch<TexturedVertex2D> batch = new LinearBatch<TexturedVertex2D>(..., PrimitiveType.Triangles);
private readonly VertexGroup<TexturedVertex2D> vertices = new VertexGroup<TexturedVertex2D>();
public override void Draw(IRenderer renderer)
{
base.Draw(renderer);
shader.Bind();
texture.Bind();
using (var usage = batch.BeginUsage(this, vertices))
{
// As above.
}
}
}
Eventually, vertex batches will be removed from DrawNodes altogether and instead go via IRenderer.
Vertex redirection
TextureGL and the helper methods of DrawNode such as DrawNode.DrawQuad() will only output TexturedVertex2D-typed vertices. For custom shaders which only accept a subset of properties, VertexGroup provides a redirection mechanism:
public class CustomDrawNode : DrawNode
{
// A VertexGroup which converts from TexturedVertex2D to CustomVertex.
private readonly VertexGroup<TexturedVertex2D, CustomVertex> vertices = new VertexGroup<TexturedVertex2D, CustomVertex>(v => new CustomVertex { ... });
public override Draw(IRenderer renderer)
{
// ...
// As per normal:
using (var usage = renderer.BeginQuads(this, vertices))
{
// ...
}
}
}
Testing
When compiled in DEBUG, osu!framework stores all vertices and throws exceptions upon incorrect usages of VertexGroup. These include:
- Vertices changing between frames without an invalidation. Fatal (should be avoided at all cost).
- Vertex equality comparer being broken. Non-fatal (only affects the results of the above check).
- I don’t expect users to compile their own osu!framework, but I don’t expect these asserts to be hit unless the user violates the single drawing contract which osu!framework can't verify in RELEASE mode. Regardless, future work in removing vertex batches from
DrawNodewill allow us to define a different allocator at runtime instead.
Performance
- Consistently seen a >25% reduction in frametime in semi-complex scenes (
SongSelect,SongSelect+BeatmapListing,SongSelect+BeatmapListing+BeatmapOverlay). Extremely complex scenes (TestSceneMultiSpectatorwith 16 players) become fully update-bound. - I've seen 70MB reduction in total allocs during a play session, as well a reduction in steady-state RAM usage. A VRAM usage counter has been added to the native section in global statistics.
- The only allocation remaining is a static 1024-vertex array used to buffer vertex uploads.
Will it still be possible to cache data in a VBO to then be manipulated with a projection matrix instead of regenerating it when the ScreenSpaceQuad changes? For this to be possible I usually used a VertexBatch (which was kept alive and therefore its memory not reclaimed) but it sounds like they are going to go away at some point...
A VertexGroupUsage.TryReuse(int count) method could be exposed which forgoes uploads if only the invalidation state has changed but not if anything else has changed like the target batch/insertion index/etc. Semantics would be the same as the existing TrySkip(int count).
Would that cover your use case?
A
VertexGroupUsage.TryReuse(int count)method could be exposed which forgoes uploads if only the invalidation state has changed but not if anything else has changed like the target batch/insertion index/etc. Semantics would be the same as the existingTrySkip(int count).Would that cover your use case?
I'm not sure. The exact use case is more or less this:
// tiles is a collection of thousands of tiles which are to be merged into a single batch, 1 quad per
if ( tilesInvalidated ) {
// this function will create and/or reuse quad vertex buffers to effectively group tiles into batches
// that use the same texture (or other uniforms) and generate quads for them
// these buffers are also shared between all draw nodes (shared data, to avoid duplication because of the triple buffer)
regenerateBuffers();
tilesInvalidated = false;
}
// this is a quirk with o!f that makes me need to call UpdateRange(0,0) on all vertex buffers
// because otherwise IVertexBuffer.Free is called and deletes both the cpu and gpu buffer
// this is because it expects for me to write to the buffer each frame, but I do it only once
// and then only draw from it
keepBuffersAlive();
shader.Bind();
// the matrix is generated off of the drawable Matrix3
shader.GetUniform<Matrix4>( "projMatrix" ).UpdateValue( ref matrix );
// this loops through all the texture -> vertex buffer entries and draws their whole content with VertexBuffer.Draw
drawBatches();
shader.Unbind();
It's not important for the vertices to be stored on the cpu, but the references to GL buffers and their lengths need to, which also means no other drawable can override these GL buffers. It is also important to be able to use several buffers, as switching between uniforms several thousand times versus just a few times is quite costly. Additionally I absolutely need to have control over when/if these buffers are cleared, as clearing them too often can lead to framerate change from 1.2k (never cleared) to about 10 (cleared every frame).
Before I dig too deep, I was looking at the performance changes from this PR and couldn't find a test scene which shows any perceivable frame time difference. Is there a specific visual tests framework-side which can be used (a worst case example I guess?) or would this be feasible to add if not currently present?
Also opening Ctrl+F3 seems to crash:
[runtime] 2022-04-19 02:28:36 [error]: An unhandled error has occurred.
[runtime] 2022-04-19 02:28:36 [error]: NUnit.Framework.AssertionException
[runtime] 2022-04-19 02:28:36 [error]: at osu.Framework.Logging.ThrowingTraceListener.Fail(String message) in /Users/dean/Projects/osu-framework/osu.Framework/Logging/ThrowingTraceListener.cs:line 23
[runtime] 2022-04-19 02:28:36 [error]: at System.Diagnostics.TraceInternal.Fail(String message)
[runtime] 2022-04-19 02:28:36 [error]: at System.Diagnostics.Trace.Assert(Boolean condition)
[runtime] 2022-04-19 02:28:36 [error]: at osu.Framework.Graphics.OpenGL.GLWrapper.Reset(Vector2 size, Int32 treeIndex) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/OpenGL/GLWrapper.cs:line 154
[runtime] 2022-04-19 02:28:36 [error]: at osu.Framework.Platform.GameHost.DrawFrame() in /Users/dean/Projects/osu-framework/osu.Framework/Platform/GameHost.cs:line 499
[runtime] 2022-04-19 02:28:36 [error]: at osu.Framework.Threading.GameThread.processFrame() in /Users/dean/Projects/osu-framework/osu.Framework/Threading/GameThread.cs:line 450
System.InvalidOperationException: Vertex addition was skipped, but the contained vertex differs.
at osu.Framework.Graphics.Batches.VertexBatch`1.osu.Framework.Graphics.Batches.Internal.IVertexBatch.EnsureCurrentVertex[TVertex](IVertexGroup vertices, TVertex vertex, String failureMessage) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/Batches/VertexBatch.cs:line 110
at osu.Framework.Graphics.Batches.VertexGroupUsage`1.Add(TInput vertex) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/Batches/VertexGroupUsage.cs:line 49
at osu.Framework.Graphics.OpenGL.Textures.TextureGLSingle.DrawQuad(VertexGroupUsage`1& vertices, Quad vertexQuad, ColourInfo drawColour, Nullable`1 textureRect, Nullable`1 inflationPercentage, Nullable`1 blendRangeOverride, Nullable`1 textureCoords) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/OpenGL/Textures/TextureGLSingle.cs:line 322
at osu.Framework.Graphics.OpenGL.Textures.TextureGLSub.DrawQuad(VertexGroupUsage`1& vertices, Quad vertexQuad, ColourInfo drawColour, Nullable`1 textureRect, Nullable`1 inflationPercentage, Nullable`1 blendRangeOverride, Nullable`1 textureCoords) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/OpenGL/Textures/TextureGLSub.cs:line 84
at osu.Framework.Graphics.Textures.Texture.DrawQuad(VertexGroupUsage`1& vertices, Quad vertexQuad, ColourInfo drawColour, Nullable`1 textureRect, Nullable`1 inflationPercentage, Nullable`1 blendRangeOverride, Nullable`1 textureCoords) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/Textures/Texture.cs:line 183
at osu.Framework.Graphics.Visualisation.TextureVisualiser.UsageBackground.UsageBackgroundDrawNode.Blit(VertexGroupUsage`1& usage) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/Visualisation/TextureVisualiser.cs:line 238
at osu.Framework.Graphics.Sprites.SpriteDrawNode.Draw(IRenderer renderer) in /Users/dean/Projects/osu-framework/osu.Framework/Graphics/Sprites/SpriteDrawNode.cs:line 92
Is there a specific visual tests framework-side which can be used
I suppose you could use TestSceneFrontToBack, for me it goes from 350 -> 450 fps.
Also opening Ctrl+F3 seems to crash
Fixed. That DrawNode's pretty dodgy.
As an aside, I've gone through the visual test browser, and noticed
TestSceneEasingCurves/TestSceneKeyBindingsGridrunning quite slow compared tomaster:
Can you profile this one and paste the hottest paths? I can't repro - that test scene goes from 800 -> 1100 fps for me.
From a 10 seconds record as I was entering the regressed test scene:
Sticking to TestSceneEasingCurves for now, how many vertices does it upload per frame? GlobalStatistics shows me around 500 vertices per frame:

Draw thread for me (upload count looks similar):
this pr:

master:

(profiling time was not consistent, so only compare %s)
VerticesDraw stays at a constant ~4500, while VerticesUpl spikes up from 500 to ~1100 at most (not sure if considerable).
https://user-images.githubusercontent.com/22781491/164150608-c2dde3d0-3cd4-44b4-aa7e-240ca5447dcb.mp4
spikes up from 500 to ~1100 at most (not sure if considerable).
There's something more to this that I'll have to investigate. 1000 vertices is nothing at all since the game regularly uploads 10x more than that.
From an initial investigation it seems macOS, and especially Apple Silicon, does really not like buffer orphaning after draw calls. It's something I didn't consider but should've - the standard case of you shouldn't modify a VBO that's already in flight.
I'll need to think more about the structure to figure out how to fix this.
FWIW, I've managed to get Veldrid running on this branch https://github.com/ppy/osu-framework/compare/master...frenzibyte:veldrid-plus-vbo-invalidation (requires a local veldrid checkout at this branch), in case you are interested.
@frenzibyte Regarding https://github.com/ppy/osu-framework/pull/5109#pullrequestreview-946404024, can you check again and see if my most recent commit fixes it?
It's not perfect, lacking tests, and a huge edge case isn't handled correctly, but should be enough to check performance for the time being.
Need some more time to dwell on this one. The problem right now is changing of states that cause a flush, happening during a group's usage (textures, shaders, etc) when most of this made with the slight intention that that shouldn't be allowed without explicitly disallowing it.
Unfortunately, fixing this would mean inverting things like texture.DrawQuad() to not automatically bind internally, and require DrawNodes to bind manually.
@frenzibyte Regarding #5109 (review), can you check again and see if my most recent commit fixes it?
It's not perfect, lacking tests, and a huge edge case isn't handled correctly, but should be enough to check performance for the time being.
It has brought performance back on par with master for TestSceneEasingCurves, but is still degraded a bit in TestSceneKeyBindingsGrid (PR shows 150fps while it runs at 250fps on master), for whatever that's worth. Great to see nonetheless.
I want to do another review pass on this PR, but first confirming that it's in a final state @smoogipoo ?
Do you have anything to add regarding the performance regression in TestSceneKeyBindingsGrid mentioned above? For me on release 234 (master) vs 280 (this branch), on debug 130 (master) vs 104 (this branch). Which seems fine. Was @frenzibyte's above comment in debug mode?
Was @frenzibyte's above comment in debug mode?
Apparently wasn't according to https://discord.com/channels/188630481301012481/589331078574112768/987972289716965427, however the takeaway is that behaviour is different on M1 (I believe) vs desktop. M1 really doesn't like VBOs being reused in one draw frame whereas AMD/NV seems to not mind it as much (probably to varying degrees, it's still incorrect to do).
Do you have anything to add regarding the performance regression in TestSceneKeyBindingsGrid mentioned above?
The above being said, I don't know why the difference in performance still remains to some degree.
but first confirming that it's in a final state
It's close enough, but I still have to refactor and re-PR because this one's kinda messy. The general idea is there, though, so you could review it from that perspective.
but I still have to refactor and re-PR because this one's kinda messy.
Best to block pending that so no one gets too deep in reviewing here then?
This PR is blocked and there's a message in the PR description about it, is that enough?
Ah, missed that. Is fine.