Fyrox
Fyrox copied to clipboard
Performance issues on macOS
There is dramatical graphical performance difference between Windows/Linux/WASM and macOS. Various macbooks can't handle rg3d properly, there could be almost 10x performance difference and FPS almost never rise up to 60 (except maybe 2D). I don't have any Apple devices ~~and it is near to impossible to install macOS on virtual machine~~, so I need a help of the community with this issue.
Possible causes:
- Index format - @toyboot4e profiled other games and found that all of them use
GL_UNSIGNED_SHORT
index format, but rg3d useGL_UNSIGNED_INT
. This might be a cause of low performance.
Sidenote on Windows/macOS performance difference instance
Someone in rg3d Discord server reported that a windows machine with GTX 760 marked 60 FPS with StationIapetus. The GPU performance is almost same as Radeon R9 370X (my mac's GPU), but mine was like 0.4 FPS.
Here's a comparison of GTX 760 and Radeon R9 370X: https://hwbench.com/vgas/radeon-r9-370-vs-geforce-gtx-760
A quick not to let you know I tried changing
diff --git a/src/renderer/framework/geometry_buffer.rs b/src/renderer/framework/geometry_buffer.rs
index 4f3af37..5ca1c1c 100644
--- a/src/renderer/framework/geometry_buffer.rs
+++ b/src/renderer/framework/geometry_buffer.rs
@@ -265,7 +265,7 @@ impl<'a> GeometryBufferBinding<'a> {
self.state.gl.draw_elements(
self.mode(),
index_count as i32,
- glow::UNSIGNED_INT,
+ glow::UNSIGNED_SHORT,
indices,
);
}
@@ -279,7 +279,7 @@ impl<'a> GeometryBufferBinding<'a> {
self.state.gl.draw_elements_instanced(
self.mode(),
index_count as i32,
- glow::UNSIGNED_INT,
+ glow::UNSIGNED_SHORT,
0,
count as i32,
)
but there was no difference in performance. I still only get about 1 frame per second on my M1 mini for some examples (and the other examples are all similarly slow compared to before this patch).
I ran cargo instruments
to use instruments and look at some profiling. It seems like most of the time is being spent in geometry_buffer.rs:draw_internal()
for several of the slower running examples (like 3rd_person and scene), so I had high hopes for this possible solution, but alas, no real change. Just thought I would let you know.
I'm suspecting that current renderer implementation is slow on macOS because of vertex attribute layout. Currently, some attributes might not be used in a draw call, I've read on StackOverflow about dramatic performance loss when some of vertex attributes are disabled during a draw call - this for some reason forces GPU driver fallback to software mode to process vertices. If this is the case, then it could be fixed relatively easy.
I just need a help with tests, since I do not have any device with macOS.
@mrDIMAS That sounds reasonable...I suspected that things were falling back to software, since it would result in that kind of framerate drop. I can test things on this machine if you like. Can you push some tests to a branch, so that I can test things? (I would try things myself, but I am completely unfamiliar with the code base and not a 3d pipeline expert, so it would probably take me a lot of effort to drill into this).
@wilsonk, @toyboot4e did some testing using LearnOpenGL Rust port - https://discord.com/channels/756573453561102427/756573453561102430/876888897794211932 and it seems that gaps in vertex attributes are fine, and does not affect performance.
The next thing that we can test, is to replace glfw
with glutin
in https://github.com/toyboot4e/learn-opengl-rs/blob/master/src/_6_pbr/_1_2_lighting_textured.rs to see if it will make a difference. There could be an issue in glutin
(which rg3d uses for OpenGL context creation)
Also, could somebody please check if changing this line https://github.com/rg3dengine/rg3d/blob/master/src/engine/mod.rs#L96 to .with_gl(glutin::GlRequest::Latest)
will make a difference?
@mrDIMAS No difference with the mod.rs:96 change to Latest. It also appears as though this may not be an issue with dropping to a software render, as my cpu doesn't max out after the drop to 1 FPS (not even on one core much less all 8) with the 3rd_person demo. Strange.
If @toyboot4e has the time and inclination I may wait to change over the opengl example to glutin. It would probably take me longer as I am, again, unfamiliar with that code base. Let me know if you want me to give it a go, however, and I can try to get around to it :)
Thanks for checking. Ok, so if it is not software fallback, what it could be then?
Here's the trace that @toyboot4e made some time ago, and the most confusing part here is that rendering a quad takes 10 microseconds, which is also enough to render 2430 triangles of the debug text on screen (last draw call in the trace). Filling G-Buffer is insanely slow, like 58158.43 µs glDrawElements(GL_TRIANGLES, 33813, GL_UNSIGNED_INT, 0x00000000);
this is just 33813 triangles, but it takes 58 ms to render.
The trace's times are confusing, I suppose these are CPU times because GPU works asynchrounously of CPU. If so then it means that the bottleneck is in the driver, it does some weird stuff inside which takes heaps of time. But what could it be?
I'd try replacing glfw with glutin!
Btw, I should get another trace. Recently the FPS dropped to like 15 with SDL. It was solved after turing off IME for unknown reason, so same change may affect to the FPS with winit. https://github.com/toyboot4e/fps-test
Woops, it looks like I didn't wait long enough on the async/scene examples to test out the 'falling back to software rendering' theory. If I wait about 3-4 seconds after the window opens and the scene is drawn, then each one of those examples pins one of my cores at 95-98%!
So I guess that rg3d is probably falling back to software. I tested toyboots lighted_texture example and there was only about 10-15% cpu usage, even if I spin the mouse around (and thus the scene) a bunch...there wasn't any spiking.
OK, I just tested toyboots example and watched the activity-monitor->gpu-history on the M1. That example uses about 60% of the gpu all the time, whereas async/scene only use about 10-15%! Though that 10-15% may also be the activity monitor itself? and/or possibly some other programs I have open? Anyways, it is definitely falling back to software rendering!
I found this on the glium github page: https://github.com/glium/glium/issues/1908 Not sure if it is relevant, but may give some ideas. Thoughts?
Again, sorry about jumping the gun there.
@wilsonk Does it prints something in the console like in the issue you've mentioned?
Another update. I upgraded to Big Sur 11.5.2 (released last week), from 11.4, and there was no difference in performance unfortunately. :(
I also noticed that the 3rd_person example uses almost no gpu whatsoever! The async/scene examples seem to at least use some gpu (and 95-98% of one core...sometimes using only 75% and maybe 30% of another 2 cores) but the 3rd_person example looks to use only 1-3% or something of the gpu (it is a graph so a little hard to be exact) and 97-100% of one core, sometimes using maybe 40% of a couple other cores, then back up to 100%.
Anyways, maybe the differences between the examples can lead to a guess about the problem?
@mrDIMAS No, there is no printout, even in debug mode...so I am not sure where that glium user is getting that info? Maybe a log file I am unaware of on Mac? Or, I suppose they may have some sort of logging for glium that prints that?
One more thing for test - https://rg3d.rs/assets/webexample/index.html - what FPS do you guys have here? It is WebAssembly demo.
Nice, I get 144 FPS with my 144hz monitor! Gpu is completely maxed out at all times. Cpus are down around 5% (with some spikes up to 30% probably due to other programs). This is in the Brave browser, if that matters. Safari wouldn't load.
Nice smooth movements for the character with nice shadows, etc. So glfw?
Ok, this is very confusing. WebAssembly uses WebGL 2.0 (which is essentially OpenGL ES 3.0) instead of OpenGL, and now I wonder what is the issue with OpenGL then 🤔 .
One more thing: WebAssembly version of the engine does not use glutin
, it manually creates WebGL context.
Ok, on the manually creating WebGL context. I was also just futzing around with bevy on the M1 and their examples run fine. They use winit, and you can supposedly get right down to low level opengl access...maybe some inspiration in their code base?
Afaik they use wgpu for rendering, not OpenGL.
Apparently bevy does use wgpu...I swear that I read one could 'drill down to the opengl layer' they said. Maybe it was old information (or possibly a different project as I have been looking at several). Apologies.
So, differences between OpenGL ES 3.0 and OpenGL are a problem, I guess.
I finally was able to install macOS Big Sur to virtual machine and I hope I can investigate the issue myself.
Same issue here. Demo running at 1-2 fps. see also apple developer forum: https://developer.apple.com/forums/thread/650427 OpenGL is deprecated, but is available on Apple silicon.
Bad news, max OpenGL version supported on a virtual machine is 2.1, but rg3d requires 3.3. So I can't run the engine on macOS on virtual machine 😞
Other game engines have struggled with macOS-specific performance issues too. This is worth a read: https://github.com/godotengine/godot/pull/47864
It seems that I've found the issue, in one of the last commits I removed the part responsible for instancing (https://github.com/rg3dengine/rg3d/commit/79b3f13701c74b5d970c6b44d94c01925b68044f#diff-a9ac9f3b084486c12bbfe10b4d3f8c6d8a89b5aace4208ce0a903ef5cbb79227L69) . This piece of code creates additional vertex buffer for instance data and attaches it to the VAO. Apparently, when using such buffer, Apple GPU driver fallbacks to software mode and vertex shader works on CPU instead of GPU.
@toyboot4e could you please also confirm that the issue is fixed for you?
Sadly it didn't work on my Intel mac! wilsonk confirmed 60 FPS on M1 mac.
Not sure if this helps, but I ran the 3rd_person example on Macs with Intel CPUs, and Intel and AMD GPUs. The MacBook Pro is a 2015 13in with Intel GPU and the iMac is a 2020 27in with Radeon Pro 5700XT. It looks like at least part of the performance issue might be related to AMD GPUs, since the iMac is actually slower than the laptop.
The quality number correlates with the example's presets that can be changed, in case know that is useful. And the FPS values are what the example was typically reporting while moving the camera/model around a bit.
Quality | MacBook FPS | iMac FPS |
---|---|---|
1 | 5 | 2 |
2 | 7 | 2 |
3 | 11 | 3 |
4 | 15 | 2 |
I tried booting NixOS in an external SSD on my MBP (mid 2015 with a dedicated GPU) and ran the 3rd_person
example:
Quality | macOS | NixOS |
---|---|---|
1 | 0.1 FPS | 20 FPS |
4 | 3 FPS | 138 FPS |
Apparently macOS OpenGL is bad and NixOS marked better FPS, but unfortunately the GPU itself might not be powerful enough to run mid~high quality rg3d
games.
Anyways I can play around with
rg3d
(on NixOS) when I have the time ;)
Hey, I'm a Mac/iOS/Web dev with about 15 years of experience, a lot of that with graphics. I don't use rg3d yet but I'm hoping to start using it (and contributing), if it's a good fit.
It's my understanding that OpenGL on Mac is a dead end, it's had minimal support and development for years. Lower level APIs like Metal have all the development and performance effort. I wouldn't recommend spending a huge amount of time fixing OpenGL, instead I think it'd be great to spend that effort to use something like Amethyst's Rendy (think Vulkan). Have you considered similar, or even it?
I think that will be much more future proof (ie. rendy uses Metal on macOS and uses spirv for portable shaders), it allows you to fine tune performance better than the low version OpenGL you're forced to use on Mac, and still works with targets like WebAssembly. It might also outsource some of these cross-platform issues that you're struggling to solve without the right dev hardware.
Edit: re-reading I see you've mentioned wgpu, which is essentially what I was wondering. Looking for wgpu in your discord it seems like you've considered it and don't think it's worth using that instead. Consider my question answered, thanks :)
Using MetalANGLE is also an option: https://github.com/godotengine/godot/pull/50253
Yeah, wgpu is probably a good replacement of OpenGL, however I don't know when I'll have time to create a renderer based on it.
Yeah, wgpu is probably a good replacement of OpenGL, however I don't know when I'll have time to create a renderer based on it.
Should we take it that time aside, if someone worked on it, it would be a welcome contribution ?
Yeah indeed that will be a very welcome contribution!
On rg3d
0.24 I wonder if the performance issue is fully resolved.
FPS in 3rd_person
example on my Intel MBP:
Quality | macOS |
---|---|
1 | 15 FPS |
4 | 45 FPS |
NOTE: The FPS is largely influenced by the window size.
@toyboot4e Cool! Now I wonder what was the issue 🤔