Fyrox Performance issues on macOS

There is dramatical graphical performance difference between Windows/Linux/WASM and macOS. Various macbooks can't handle rg3d properly, there could be almost 10x performance difference and FPS almost never rise up to 60 (except maybe 2D). I don't have any Apple devices ~~and it is near to impossible to install macOS on virtual machine~~, so I need a help of the community with this issue.

Possible causes:

Index format - @toyboot4e profiled other games and found that all of them use GL_UNSIGNED_SHORT index format, but rg3d use GL_UNSIGNED_INT. This might be a cause of low performance.

Jul 23 '21 09:07 mrDIMAS

Sidenote on Windows/macOS performance difference instance

Someone in rg3d Discord server reported that a windows machine with GTX 760 marked 60 FPS with StationIapetus. The GPU performance is almost same as Radeon R9 370X (my mac's GPU), but mine was like 0.4 FPS.

Here's a comparison of GTX 760 and Radeon R9 370X: https://hwbench.com/vgas/radeon-r9-370-vs-geforce-gtx-760

Jul 23 '21 09:07 toyboot4e

A quick not to let you know I tried changing

diff --git a/src/renderer/framework/geometry_buffer.rs b/src/renderer/framework/geometry_buffer.rs
index 4f3af37..5ca1c1c 100644
--- a/src/renderer/framework/geometry_buffer.rs
+++ b/src/renderer/framework/geometry_buffer.rs
@@ -265,7 +265,7 @@ impl<'a> GeometryBufferBinding<'a> {
             self.state.gl.draw_elements(
                 self.mode(),
                 index_count as i32,
-                glow::UNSIGNED_INT,
+                glow::UNSIGNED_SHORT,
                 indices,
             );
         }
@@ -279,7 +279,7 @@ impl<'a> GeometryBufferBinding<'a> {
                 self.state.gl.draw_elements_instanced(
                     self.mode(),
                     index_count as i32,
-                    glow::UNSIGNED_INT,
+                    glow::UNSIGNED_SHORT,
                     0,
                     count as i32,
                 )

but there was no difference in performance. I still only get about 1 frame per second on my M1 mini for some examples (and the other examples are all similarly slow compared to before this patch).

I ran cargo instruments to use instruments and look at some profiling. It seems like most of the time is being spent in geometry_buffer.rs:draw_internal() for several of the slower running examples (like 3rd_person and scene), so I had high hopes for this possible solution, but alas, no real change. Just thought I would let you know.

Aug 16 '21 03:08 wilsonk

I'm suspecting that current renderer implementation is slow on macOS because of vertex attribute layout. Currently, some attributes might not be used in a draw call, I've read on StackOverflow about dramatic performance loss when some of vertex attributes are disabled during a draw call - this for some reason forces GPU driver fallback to software mode to process vertices. If this is the case, then it could be fixed relatively easy.

I just need a help with tests, since I do not have any device with macOS.

Aug 16 '21 16:08 mrDIMAS

@mrDIMAS That sounds reasonable...I suspected that things were falling back to software, since it would result in that kind of framerate drop. I can test things on this machine if you like. Can you push some tests to a branch, so that I can test things? (I would try things myself, but I am completely unfamiliar with the code base and not a 3d pipeline expert, so it would probably take me a lot of effort to drill into this).

Aug 16 '21 17:08 wilsonk

@wilsonk, @toyboot4e did some testing using LearnOpenGL Rust port - https://discord.com/channels/756573453561102427/756573453561102430/876888897794211932 and it seems that gaps in vertex attributes are fine, and does not affect performance.

Aug 16 '21 18:08 mrDIMAS

The next thing that we can test, is to replace glfw with glutin in https://github.com/toyboot4e/learn-opengl-rs/blob/master/src/_6_pbr/_1_2_lighting_textured.rs to see if it will make a difference. There could be an issue in glutin (which rg3d uses for OpenGL context creation)

Aug 16 '21 18:08 mrDIMAS

Also, could somebody please check if changing this line https://github.com/rg3dengine/rg3d/blob/master/src/engine/mod.rs#L96 to .with_gl(glutin::GlRequest::Latest) will make a difference?

Aug 16 '21 18:08 mrDIMAS

@mrDIMAS No difference with the mod.rs:96 change to Latest. It also appears as though this may not be an issue with dropping to a software render, as my cpu doesn't max out after the drop to 1 FPS (not even on one core much less all 8) with the 3rd_person demo. Strange.

If @toyboot4e has the time and inclination I may wait to change over the opengl example to glutin. It would probably take me longer as I am, again, unfamiliar with that code base. Let me know if you want me to give it a go, however, and I can try to get around to it :)

Aug 16 '21 19:08 wilsonk

Thanks for checking. Ok, so if it is not software fallback, what it could be then?

Aug 16 '21 20:08 mrDIMAS

Here's the trace that @toyboot4e made some time ago, and the most confusing part here is that rendering a quad takes 10 microseconds, which is also enough to render 2430 triangles of the debug text on screen (last draw call in the trace). Filling G-Buffer is insanely slow, like 58158.43 µs glDrawElements(GL_TRIANGLES, 33813, GL_UNSIGNED_INT, 0x00000000); this is just 33813 triangles, but it takes 58 ms to render.

rg3d-3rd-person-trace.txt

Aug 16 '21 20:08 mrDIMAS

The trace's times are confusing, I suppose these are CPU times because GPU works asynchrounously of CPU. If so then it means that the bottleneck is in the driver, it does some weird stuff inside which takes heaps of time. But what could it be?

Aug 16 '21 20:08 mrDIMAS

I'd try replacing glfw with glutin!

Btw, I should get another trace. Recently the FPS dropped to like 15 with SDL. It was solved after turing off IME for unknown reason, so same change may affect to the FPS with winit. https://github.com/toyboot4e/fps-test

Aug 16 '21 22:08 toyboot4e

Woops, it looks like I didn't wait long enough on the async/scene examples to test out the 'falling back to software rendering' theory. If I wait about 3-4 seconds after the window opens and the scene is drawn, then each one of those examples pins one of my cores at 95-98%!

So I guess that rg3d is probably falling back to software. I tested toyboots lighted_texture example and there was only about 10-15% cpu usage, even if I spin the mouse around (and thus the scene) a bunch...there wasn't any spiking.

OK, I just tested toyboots example and watched the activity-monitor->gpu-history on the M1. That example uses about 60% of the gpu all the time, whereas async/scene only use about 10-15%! Though that 10-15% may also be the activity monitor itself? and/or possibly some other programs I have open? Anyways, it is definitely falling back to software rendering!

I found this on the glium github page: https://github.com/glium/glium/issues/1908 Not sure if it is relevant, but may give some ideas. Thoughts?

Again, sorry about jumping the gun there.

Aug 17 '21 06:08 wilsonk

@wilsonk Does it prints something in the console like in the issue you've mentioned?

Aug 17 '21 07:08 mrDIMAS

Another update. I upgraded to Big Sur 11.5.2 (released last week), from 11.4, and there was no difference in performance unfortunately. :(

I also noticed that the 3rd_person example uses almost no gpu whatsoever! The async/scene examples seem to at least use some gpu (and 95-98% of one core...sometimes using only 75% and maybe 30% of another 2 cores) but the 3rd_person example looks to use only 1-3% or something of the gpu (it is a graph so a little hard to be exact) and 97-100% of one core, sometimes using maybe 40% of a couple other cores, then back up to 100%.

Anyways, maybe the differences between the examples can lead to a guess about the problem?

Aug 17 '21 07:08 wilsonk

@mrDIMAS No, there is no printout, even in debug mode...so I am not sure where that glium user is getting that info? Maybe a log file I am unaware of on Mac? Or, I suppose they may have some sort of logging for glium that prints that?

Aug 17 '21 07:08 wilsonk

One more thing for test - https://rg3d.rs/assets/webexample/index.html - what FPS do you guys have here? It is WebAssembly demo.

Aug 17 '21 09:08 mrDIMAS

Nice, I get 144 FPS with my 144hz monitor! Gpu is completely maxed out at all times. Cpus are down around 5% (with some spikes up to 30% probably due to other programs). This is in the Brave browser, if that matters. Safari wouldn't load.

Nice smooth movements for the character with nice shadows, etc. So glfw?

Aug 17 '21 09:08 wilsonk

Ok, this is very confusing. WebAssembly uses WebGL 2.0 (which is essentially OpenGL ES 3.0) instead of OpenGL, and now I wonder what is the issue with OpenGL then 🤔 .

Aug 17 '21 09:08 mrDIMAS

One more thing: WebAssembly version of the engine does not use glutin, it manually creates WebGL context.

Aug 17 '21 09:08 mrDIMAS

Ok, on the manually creating WebGL context. I was also just futzing around with bevy on the M1 and their examples run fine. They use winit, and you can supposedly get right down to low level opengl access...maybe some inspiration in their code base?

Aug 17 '21 09:08 wilsonk

Afaik they use wgpu for rendering, not OpenGL.

Aug 17 '21 10:08 mrDIMAS

Apparently bevy does use wgpu...I swear that I read one could 'drill down to the opengl layer' they said. Maybe it was old information (or possibly a different project as I have been looking at several). Apologies.

So, differences between OpenGL ES 3.0 and OpenGL are a problem, I guess.

Aug 17 '21 18:08 wilsonk

I finally was able to install macOS Big Sur to virtual machine and I hope I can investigate the issue myself.

Aug 17 '21 18:08 mrDIMAS

Same issue here. Demo running at 1-2 fps. see also apple developer forum: https://developer.apple.com/forums/thread/650427 OpenGL is deprecated, but is available on Apple silicon.

Aug 17 '21 19:08 gdf8gdn8

Bad news, max OpenGL version supported on a virtual machine is 2.1, but rg3d requires 3.3. So I can't run the engine on macOS on virtual machine 😞

Aug 18 '21 08:08 mrDIMAS

Other game engines have struggled with macOS-specific performance issues too. This is worth a read: https://github.com/godotengine/godot/pull/47864

Aug 19 '21 19:08 Calinou

It seems that I've found the issue, in one of the last commits I removed the part responsible for instancing (https://github.com/rg3dengine/rg3d/commit/79b3f13701c74b5d970c6b44d94c01925b68044f#diff-a9ac9f3b084486c12bbfe10b4d3f8c6d8a89b5aace4208ce0a903ef5cbb79227L69) . This piece of code creates additional vertex buffer for instance data and attaches it to the VAO. Apparently, when using such buffer, Apple GPU driver fallbacks to software mode and vertex shader works on CPU instead of GPU.

@toyboot4e could you please also confirm that the issue is fixed for you?

Aug 31 '21 07:08 mrDIMAS

Sadly it didn't work on my Intel mac! wilsonk confirmed 60 FPS on M1 mac.

Aug 31 '21 10:08 toyboot4e

Not sure if this helps, but I ran the 3rd_person example on Macs with Intel CPUs, and Intel and AMD GPUs. The MacBook Pro is a 2015 13in with Intel GPU and the iMac is a 2020 27in with Radeon Pro 5700XT. It looks like at least part of the performance issue might be related to AMD GPUs, since the iMac is actually slower than the laptop.

The quality number correlates with the example's presets that can be changed, in case know that is useful. And the FPS values are what the example was typically reporting while moving the camera/model around a bit.

Quality	MacBook FPS	iMac FPS
1	5	2
2	7	2
3	11	3
4	15	2

Sep 13 '21 21:09 tkyanko

I tried booting NixOS in an external SSD on my MBP (mid 2015 with a dedicated GPU) and ran the 3rd_person example:

Quality	macOS	NixOS
1	0.1 FPS	20 FPS
4	3 FPS	138 FPS

Apparently macOS OpenGL is bad and NixOS marked better FPS, but unfortunately the GPU itself might not be powerful enough to run mid~high quality rg3d games.

Anyways I can play around with rg3d (on NixOS) when I have the time ;)

Oct 03 '21 03:10 toyboot4e

Hey, I'm a Mac/iOS/Web dev with about 15 years of experience, a lot of that with graphics. I don't use rg3d yet but I'm hoping to start using it (and contributing), if it's a good fit.

It's my understanding that OpenGL on Mac is a dead end, it's had minimal support and development for years. Lower level APIs like Metal have all the development and performance effort. I wouldn't recommend spending a huge amount of time fixing OpenGL, instead I think it'd be great to spend that effort to use something like Amethyst's Rendy (think Vulkan). Have you considered similar, or even it?

I think that will be much more future proof (ie. rendy uses Metal on macOS and uses spirv for portable shaders), it allows you to fine tune performance better than the low version OpenGL you're forced to use on Mac, and still works with targets like WebAssembly. It might also outsource some of these cross-platform issues that you're struggling to solve without the right dev hardware.

Edit: re-reading I see you've mentioned wgpu, which is essentially what I was wondering. Looking for wgpu in your discord it seems like you've considered it and don't think it's worth using that instead. Consider my question answered, thanks :)

Oct 04 '21 05:10 therealbnut

Using MetalANGLE is also an option: https://github.com/godotengine/godot/pull/50253

Oct 04 '21 12:10 Calinou

Yeah, wgpu is probably a good replacement of OpenGL, however I don't know when I'll have time to create a renderer based on it.

Oct 05 '21 11:10 mrDIMAS

Yeah, wgpu is probably a good replacement of OpenGL, however I don't know when I'll have time to create a renderer based on it.

Should we take it that time aside, if someone worked on it, it would be a welcome contribution ?

Oct 15 '21 10:10 paulsika

Yeah indeed that will be a very welcome contribution!

Oct 15 '21 15:10 mrDIMAS

On rg3d 0.24 I wonder if the performance issue is fully resolved.

FPS in 3rd_person example on my Intel MBP:

Quality	macOS
1	15 FPS
4	45 FPS

NOTE: The FPS is largely influenced by the window size.

Jan 09 '22 05:01 toyboot4e

@toyboot4e Cool! Now I wonder what was the issue 🤔

Jan 09 '22 07:01 mrDIMAS

Fyrox Fyrox copied to clipboard

Performance issues on macOS

Fyrox
Fyrox copied to clipboard