macroquad High CPU usage on Windows 10, even if vsync is enabled

Running a simple project which only clears the screen the program seems to be consuming 100% of one CPU core at all times.

Both get_fps() and FRAPS show a stable 60 fps, which would indicate that vsync or some other frame limiting is enabled.
The program is compiled with cargo run --release.
Macroquad version is 0.3.7.
The system is a laptop with integrated graphics (Intel UHD 620) running Windows 10.

It seems that for some reason the program uses a thread 100% of the time for just waiting to draw the next frame.

The tested code is:

use macroquad::prelude::*;

#[macroquad::main("Test")]
async fn main() {
    loop {
        clear_background(RED);
        next_frame().await
    }
}

Jul 15 '21 20:07 FeldrinH

Maybe the author can clarify, but I've looked over the code, doesn't seem like there's any thread sleeping going on, so it would make sense it just runs as fast as possible.

Jul 18 '21 10:07 louiidev

there is some sleeping: https://github.com/not-fl3/macroquad/blob/master/src/lib.rs#L559

Maybe it is not enough?

Jul 18 '21 17:07 not-fl3

The docs are kinda vague on this, but from what I gather yield_now behaviour is highly platform specific and in many cases it may not sleep at all, if there are few other processes that need to run at the same time.

Jul 18 '21 19:07 FeldrinH

Also if it is windows/nvidia - it may be the driver issue https://stackoverflow.com/questions/21925313/100-cpu-utilization-when-using-vsync-opengl/21929019#21929019 - it looks like the driver is doing spinlock while waiting for a vsync

Jul 18 '21 19:07 not-fl3

I suppose its possible. I have intel integrated graphics though.

Could also be related to this: https://github.com/rust-lang/rust/issues/46774

Also, I tested wasm as well and the high cpu usage is absent on wasm. The cpu usage seems to be fairly minimal there.

Jul 18 '21 19:07 FeldrinH

Oh, intresting. Try to add some sleep right before calling next_frame(), maybe it will help?

Jul 18 '21 20:07 not-fl3

there is some sleeping: https://github.com/not-fl3/macroquad/blob/master/src/lib.rs#L559

Maybe it is not enough?

Ah, my bad. I was looking for the sleep fn.

Jul 19 '21 02:07 louiidev

Oh, intresting. Try to add some sleep right before calling next_frame(), maybe it will help?

Calling sleep does help cpu usage, however, this presents two new issues:

How do I know how long to sleep? According to #134 there is no way to find out how much time drawing actually takes?
Thread sleep is not very portable (notably not present on wasm). This breaks macroquads stated goal of having all targets work the same with the exact same code.

Jul 19 '21 13:07 FeldrinH

I'm experiencing the same issue, although on different hardware and O/S.

I suspect that this is a Macroquad problem; on my system, this doesn't happen on many other game engines (Amethyst, Tetra, Ggez), and also, my hardware is different from @FeldrinH.

Aug 09 '21 21:08 64kramsystem

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in https://github.com/not-fl3/macroquad/issues/257#issuecomment-882107996 this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

Thread 1 "tiled-test" received signal SIGINT, Interrupt.
0x00007ffff7a48857 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7a48857 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff5c8651d in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#2  0x00007ffff5c91624 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#3  0x00007ffff5c0e1cf in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#4  0x00007ffff5813f64 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.470.63.01
#5  0x00005555555b2c77 in macroquad::{impl#2}::draw (self=0x1) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/macroquad-0.3.10/src/lib.rs:646
#6  0x0000555555689b1b in miniquad::frame (user_data=0x5555557566a0) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/miniquad-0.3.0-alpha.37/src/lib.rs:272
#7  0x0000555555690e27 in sapp_linux::_sapp_call_frame () at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2627
#8  0x0000555555690e47 in sapp_linux::_sapp_frame () at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2639
#9  0x0000555555691696 in sapp_linux::sapp_run (desc=0x7fffffffcb88) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/sapp-linux-0.1.13/src/lib.rs:2783
...
(gdb) up 5
#5  0x00005555555b2c77 in macroquad::{impl#2}::draw (self=0x1) at /home/martin/.cargo/registry/src/github.com-1ecc6299db9ec823/macroquad-0.3.10/src/lib.rs:646
646                         miniquad::gl::glFinish();
(gdb) l

Oct 10 '21 16:10 martinxyz

Got this on Linux.

The backtrace shows that glFinish() ends with the nvidia driver calling sched_yield(), presumably in a busy-wait loop. (To confirm, with strace I'm counting 60k sched_yield syscalls per second.)

As suggested in #257 (comment) this seems to be a nvidia driver issue. A web search finds related issues in a couple of projects (e.g. gnome), but I don't understand what the solution was.

With the information at hand (thanks @martinxyz), and after some digging, I think it's an overlook ("bug-ish" sort of thing) in MQ.

I've checked the glFinish() invocation history, and it is used for telemetry:

commit 05e69c49c5e8b5281f7f6645882168628e45b89f
Author: Fedor Logachev <[email protected]>
Date:   Thu Jan 7 12:11:04 2021 -0600

    telemetry: More internal metrics

diff --git a/profiler/src/lib.rs b/profiler/src/lib.rs
index 0317101..4286e55 100644
--- a/profiler/src/lib.rs
+++ b/profiler/src/lib.rs
@@ -188,14 +188,20 @@ fn zone_ui(ui: &mut Ui, zone: &Zone, n: usize) {
                     }
                 }
 
-                ui.label(
-                    None,
-                    &format!(
-                        "Full frame time: {:.3}ms {:.1}(1/t)",
-                        get_frame_time(),
-                        (1.0 / get_frame_time())
-                    ),
-                );
+                if let Some(frame) = state
+                    .selected_frame
+                    .as_ref()
+                    .or_else(|| state.frames_buffer.get(0))
+                {
+                    ui.label(
+                        None,
+                        &format!(
+                            "Full frame time: {:.3}ms {:.1}(1/t)",
+                            frame.full_frame_time,
+                            (1.0 / frame.full_frame_time)
+                        ),
+                    );
+                }
 
                 if state.paused {
                     if ui.button(None, "resume") {
diff --git a/src/lib.rs b/src/lib.rs
index 486e01c..4e3bd7f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -247,6 +247,8 @@ fn draw(&mut self) {
             let _z = telemetry::ZoneGuard::new("Event::draw");
 
             if let Some(future) = unsafe { MAIN_FUTURE.as_mut() } {
+                let _z = telemetry::ZoneGuard::new("Main loop");
+
                 get_context().begin_frame();
 
                 if exec::resume(future) {
@@ -263,6 +265,16 @@ fn draw(&mut self) {
 
             get_context().frame_time = date::now() - get_context().last_frame_time;
             get_context().last_frame_time = date::now();
+
+            #[cfg(any(target_arch = "wasm32", target_os = "linux"))]
+            {
+                let _z = telemetry::ZoneGuard::new("glFinish/glFLush");
+
+                unsafe {
+                    miniquad::gl::glFlush();
+                    miniquad::gl::glFinish();
+                }
+            }
         }
 
         telemetry::reset();

I've had a look around, and for example, Intel discourages using glFinish() in its OpenGL optimization guide.

Potentially, this could be a misuse of the OpenGL APIs, but I'm no expert in this.

Regardless, at the very least, telemetry should be preset according to the profile (e.g. disabled in release mode, and enabled in debug mode), or anyway, it should be configurable, and if enabled in release mode, it should be very clearly documented that it can potentially hurt performance.

Oct 10 '21 17:10 64kramsystem

Note: the code above strictly relates to Linux (as it's guarded by a feature), so it's very likely unrelated to the current issue (which is on Windows).

I've copied my comment to the Linux-specific issue (#275).

@martinxyz FYI

Oct 10 '21 18:10 64kramsystem

Same problem.

Adding some sleep before next_frame() will alleviate this problem. But it is so weird.

After trying some different versions, I found that 0.3.26 does not have this problem, but 0.4.0 does. I ran a profiling on 0.4.5, the result is as follows. Hope it can help fix the problem.

btw, I noticed that when I ran the program with 0.3.26, the console showes:

WGL: failed to create 3.2 context, trying 2.1
GL_VERSION: 4.6.0 - Build 26.20.100.7642

but nothing with 0.4.0. Maybe it is related to the version of WGL or not?

May 19 '24 11:05 flaribbit

@flaribbit I can verify this same result with both versions. I don't need anything fancy so I'm opting to downgrade for now.

Jul 27 '24 05:07 SeedyROM

I recognize that this issue is specifically about Windows, but I want to share that I noticed high CPU usage on MacOS starting in v0.4.4 and pinpointed the specific commit using git bisect, which ended up being an issue in Miniquad. I logged the issue here: https://github.com/not-fl3/miniquad/issues/470

@flaribbit have you ever used git bisect before? It's a useful tool for finding what exact commit caused the problem! So for this case you'd checkout the Macroquad codebase and follow these steps:

git bisect start — start the bisect process
git checkout v0.4.0 — test the problem, like cargo run --example texture --release, and verify it's present
git bisect bad — mark the commit as "bad"
git checkout v0.3.26 — go to a working commit and test it out, if problem isn't present then:
git bisect good — mark the commit as "good"

Git will then bisect through the commits based on the good and bad marker. You test out each commit, marking it as either git bisect good or git bisect bad. Eventually you'll land on the commit that introduced the regression, which can majorly help with fixing it!

Aug 04 '24 01:08 brettchalupa

macroquad macroquad copied to clipboard

High CPU usage on Windows 10, even if vsync is enabled

macroquad
macroquad copied to clipboard