comfy
comfy copied to clipboard
compress vertex color data to [u8;4]
Hi, I saw your chat on reddit when you first released comfy, talking about bunnymark performance, and now that I've seen your official benchmark I thought I would try to contribute an improvement.
This changes vertices to use a [u8;4], with rgba ranging from 0-255, rather than a [f32;4] to represent color data - the gpu automatically reconstructs it as a vec4 in the shader, so no shader changes are needed. The main benefit of doing this is decreasing the bandwidth for transferring vertex data from sys mem -> gpu mem. Since color data has to be repeated for each vertex, it adds a lot of bulk to the vertex buffer.
For 2D tests like a bunnymark, cutting down on buffer size is the biggest bottleneck for most frameworks (even before game/engine logic). Most bunnymarks out there have a CPU bottleneck because uploading all that data is a blocking operation. based on your comfymark target of 50fps, on my system (linux, r9 5900x, rtx2070) - this change brought the comfymark being capable of ~45,000 up to ~65,000.
I am not super familar with the inner workings of wgpu (im most familiar with pure opengl), or anything specific you do in comfy. It does seem like your framework's entity/world concept probably does create some overhead which is to be expected, and will probably limit a benchmark like this in the end - but there are some other changes to the pipeline that could reduce the 'bandwidth' issue further - though they'd require more complex/invasive changes than this PR
If you'd like I'd be happy to talk more specifics about optimizing the sprite rendering - I've put a good amount of time into figuring out specifically the bunnymark, so it'd be nice to be able to share some of that ahaha