macroquad icon indicating copy to clipboard operation
macroquad copied to clipboard

Drawing to a texture can be extremely slow

Open JaniM opened this issue 11 months ago • 18 comments

I managed to reduce the issue to the following code. As is, it runs at 3fps on my machine. If I remove any of the three marked lines, I get 80fps+.

Tested against master and 0.4.13 on MacOS. This does NOT happen on 0.4.13, ie. this is a regression.

use macroquad::prelude::*;

#[macroquad::main("Letterbox")]
async fn main() {
    let width = screen_width();
    let height = screen_height();
    let target = render_target(width as u32, height as u32);

    let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
    render_target_cam.render_target = Some(target.clone());

    loop {
        // Removing this line fixes the problem.
        set_camera(&render_target_cam); // <--
        clear_background(BLACK);

        for _ in 0..1000 {
            let rect = Rect::new(10.0, 50.0, 100.0, 30.0);

            // Removing either of the following lines fixes the problem.
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
            draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
        }

        set_default_camera();
        clear_background(BLACK);
        draw_texture_ex(
            &target.texture,
            0.0,
            0.0,
            WHITE,
            DrawTextureParams {
                dest_size: Some(vec2(width, height)),
                flip_y: true, // Must flip y otherwise 'render_target' will be upside down
                ..Default::default()
            },
        );
        draw_fps();

        next_frame().await;
    }
}

JaniM avatar Jan 07 '25 18:01 JaniM

@ollej on Discord noticed rearranging the code like follows works as we'd expect, with stable 80fps+.

use macroquad::prelude::*;

#[macroquad::main("Letterbox")]
async fn main() {
    let width = screen_width();
    let height = screen_height();
    let target = render_target(width as u32, height as u32);

    let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
    render_target_cam.render_target = Some(target.clone());

    loop {
        set_camera(&render_target_cam); // <--
        clear_background(BLACK);

        let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
        for _ in 0..1000 {
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
        }

        for _ in 0..1000 {
            draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
        }

        set_default_camera();
        clear_background(BLACK);
        draw_texture_ex(
            &target.texture,
            0.0,
            0.0,
            WHITE,
            DrawTextureParams {
                dest_size: Some(vec2(width, height)),
                flip_y: true, // Must flip y otherwise 'render_target' will be upside down
                ..Default::default()
            },
        );
        draw_fps();

        next_frame().await;
    }
}

JaniM avatar Jan 07 '25 18:01 JaniM

Because you are creating Rect 1000 times per frame. Move Rect::new outside the game loop:


#[macroquad::main("Letterbox")]
async fn main() {
    let width = screen_width();
    let height = screen_height();
    let target = render_target(width as u32, height as u32);

    let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
    render_target_cam.render_target = Some(target.clone());

    let rect = Rect::new(10.0, 50.0, 100.0, 30.0);

    loop {
        set_camera(&render_target_cam); // <--
        clear_background(BLACK);


        for _ in 0..1000 {
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
        }

        for _ in 0..1000 {
            draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
        }

        set_default_camera();
        clear_background(BLACK);
        draw_texture_ex(
            &target.texture,
            0.0,
            0.0,
            WHITE,
            DrawTextureParams {
                dest_size: Some(vec2(width, height)),
                flip_y: true, // Must flip y otherwise 'render_target' will be upside down
                ..Default::default()
            },
        );

        draw_text(&get_fps().to_string(), screen_width() - 50., 30.0, 30.0, BLUE);

        next_frame().await;
    }
}

historydev avatar Jan 07 '25 20:01 historydev

Because you are creating Rect 1000 times per frame. Move Rect::new outside the game loop:

  1. Rect is so cheap to create that it's not relevant.
  2. LLVM is smart enough to bring it out (or completely inline here).
  3. You do not need to touch the rect in my original reproduction to get rid of the perf issue.

JaniM avatar Jan 07 '25 20:01 JaniM

Because you are creating Rect 1000 times per frame. Move Rect::new outside the game loop:

  1. Rect is so cheap to create that it's not relevant.
  2. LLVM is smart enough to bring it out (or completely inline here).
  3. You do not need to touch the rect in my original reproduction to get rid of the perf issue.

Your code gives me a stable 240 frames with vsync, without it - it gives more. That's why I physically can't test it and I have to look for the problem by eye.

But in 90% of cases such drops are human error, so I would check it a few more times.

historydev avatar Jan 07 '25 21:01 historydev

Your code gives me a stable 240 frames with vsync, without it - it gives more. That's why I physically can't test it and I have to look for the problem by eye.

Based on your snippet not using draw_fps(), I assume you are using the released version of macroquad, which i explicitly said does not have this issue. You also based your snippet on the version I said works on master too.

JaniM avatar Jan 07 '25 21:01 JaniM

Your code gives me a stable 240 frames with vsync, without it - it gives more. That's why I physically can't test it and I have to look for the problem by eye.

Based on your snippet not using draw_fps(), I assume you are using the released version of macroquad, which i explicitly said does not have this issue. You also based your snippet on the version I said works on master too.

Yes, I tested on 0.4.13, but you didn't specify the version on which this happens.

historydev avatar Jan 07 '25 21:01 historydev

The problem is also reproduced on Linux(X11 Fedora 40 Gnome). Viewing the framegraph showed that most of the time is spent on the quad_gl draw method and waiting for the iris_dri.so driver to respond (as far as I know, this is the mesa opengl driver for intel). For some reason, it is in this situation that a huge wait for the driver to respond occurs. Whether this is a driver error or something else - I do not know. I am sharing the files with you. The "bad" file is a snapshot with the program from the example problem. The "good" file is a snapshot when I draw text or a rectangle in different cycles. This is very bad (Note: the x-axis placement doesn't make sense (random placement). The y-axis is the nesting of functions): bad This is good(we can see that the rendering functions take up most of the program's time): good

KurlykovDanila avatar Jan 13 '25 05:01 KurlykovDanila

I can't say that this would help the issue, but I'd like to point out that (from my understanding) it is REALLY REALLY bad practice to shadow a variable unless necessary (such as to change the type while retaining the variable name). Since you're not changing the type, it may be most beneficial to just allocate the rect variable outside of the main loop. Below is a reply that shows something similar. I really do doubt that not shadowing the variable would help performance, but it is good practice nonetheless.

@ollej on Discord noticed rearranging the code like follows works as we'd expect, with stable 80fps+.

use macroquad::prelude::*;

#[macroquad::main("Letterbox")] async fn main() { let width = screen_width(); let height = screen_height(); let target = render_target(width as u32, height as u32);

let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());

loop {
    set_camera(&render_target_cam); // <--
    clear_background(BLACK);

    let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
    for _ in 0..1000 {
        draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
    }

    for _ in 0..1000 {
        draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
    }

    set_default_camera();
    clear_background(BLACK);
    draw_texture_ex(
        &target.texture,
        0.0,
        0.0,
        WHITE,
        DrawTextureParams {
            dest_size: Some(vec2(width, height)),
            flip_y: true, // Must flip y otherwise 'render_target' will be upside down
            ..Default::default()
        },
    );
    draw_fps();

    next_frame().await;
}

}

BnDLett avatar Jan 19 '25 03:01 BnDLett

The original code does not shadow any variables. It'd not be considered bad practice in rust anyhow.

JaniM avatar Jan 19 '25 10:01 JaniM

Sorry, you are right about the shadowing part. It was late when I was typing that. Nonetheless, the code can be improved (imo) by adjusting this segment specifically.

        for _ in 0..1000 {
            let rect = Rect::new(10.0, 50.0, 100.0, 30.0); // <- this can and should be initialized outside of the main loop

            // Removing either of the following lines fixes the problem.
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
            draw_text("wow", rect.x, rect.y, 24.0, WHITE);
        }

This is further justified by the way that you already do that with the width, height, and target. Additionally, here is a stackoverflow answer that further supports my point (of course, that's for the C/C++ compiler, so 🤷‍).

However, it's up to you on if you want to implement that. I'm just throwing my words of wisdom out here. I'm sure the compiler would recognize that and fix it for you -- but relying on the compiler to fix something that you know you can improve is just outright bad practice in my opinion.

If you do have your reasons for doing that, then it'd be highly helpful if you specified those reasons. :)

BnDLett avatar Jan 19 '25 20:01 BnDLett

@BnDLett You are certainly right about shadowing, but the question was about something else, so your answer is not relevant.

KurlykovDanila avatar Jan 20 '25 14:01 KurlykovDanila

Certainly not relevant. As such, I rest my case.

BnDLett avatar Jan 20 '25 14:01 BnDLett

Viewing the framegraph showed that most of the time is spent on the quad_gl draw method and waiting for the iris_dri.so driver to respond (as far as I know, this is the mesa opengl driver for intel). For some reason, it is in this situation that a huge wait for the driver to respond occurs.

@KurlykovDanila this more than likely means that the graphics side of things is having trouble drawing all stuff in-time

In fact. I get low FPS with this code too on Windows. I also experienced the performance issues on the released 0.13.0 version unlike @JaniM.

However, this performance dip can be reproduced with much simpler code:

use macroquad::prelude::*;

#[macroquad::main("Letterbox")]
async fn main() {
    let width = screen_width();
    let height = screen_height();
    let target = render_target(width as u32, height as u32);

    let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
    render_target_cam.render_target = Some(target.clone());

    loop {
        clear_background(BLACK);

        for _ in 0..1000 {
            let rect = Rect::new(10.0, 50.0, 100.0, 30.0);

            // Removing either of the following lines fixes the problem.
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
            draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
        }
        draw_fps();

        next_frame().await;
    }
}

I looked at the code thoroughly and also debugged it. It seems that the main problem here is that this program ends up producing 2000 unbatched drawcalls. One drawcall in macroquad is quite an expensive operation, hence the performance drop.

The 2000 unbatched drawcalls part is caused by interleaving draw_rectangle and draw_text. Hence why when you remove draw_rectangle or draw_text -- the problem disappears. macroquad figures out that they can be grouped into one drawcall.

P.S.

The performance did get somewhat worse THOUGH. I tried to look for a commit that did that and seems that before b0bf24b455d8be3385dc50088277421812a6dfe2 it was all better

InnocentusLime avatar Feb 18 '25 22:02 InnocentusLime

It is reproduced in windows. Performance seems to be worse with version 0.4.14. 0.4.13 is fine.

0.4.13 0.4.14
1 26 fps 24 fps
2 145±2 fps 145±2 fps
3 220 - 230 fps 155 - 160 fps
4 min 400, avg 430, max 475 fps 445 - 465 fps
  • All release profile
  • 1 and 2 take time before a primitives is drawn for the first time.
  • fps was measured by RenderDoc.

1)

use macroquad::{miniquad::conf::Platform, prelude::*};

fn window_conf() -> Conf {
    Conf {
        window_title: "Letterbox".to_owned(),
        platform: Platform {
            swap_interval: Some(0),
            ..Default::default()
        },
        ..Default::default()
    }
}

#[macroquad::main(window_conf)]
async fn main() {
    loop {
        clear_background(BLACK);

        let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
        for _ in 0..1000 {
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
            draw_text("wow", rect.x, rect.y, 24.0, WHITE);
        }

        next_frame().await;
    }
}

2)

use macroquad::{miniquad::conf::Platform, prelude::*};

fn window_conf() -> Conf {
    Conf {
        window_title: "Letterbox".to_owned(),
        platform: Platform {
            swap_interval: Some(0),
            ..Default::default()
        },
        ..Default::default()
    }
}

#[macroquad::main(window_conf)]
async fn main() {
    loop {
        clear_background(BLACK);

        let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
        for _ in 0..1000 {
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
        }

        for _ in 0..1000 {
            draw_text("wow", rect.x, rect.y, 24.0, WHITE);
        }

        next_frame().await;
    }
}

3)

use macroquad::{miniquad::conf::Platform, prelude::*};

fn window_conf() -> Conf {
    Conf {
        window_title: "Letterbox".to_owned(),
        platform: Platform {
            swap_interval: Some(0),
            ..Default::default()
        },
        ..Default::default()
    }
}

#[macroquad::main(window_conf)]
async fn main() {
    let width = screen_width();
    let height = screen_height();
    let target = render_target(width as u32, height as u32);

    let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
    render_target_cam.render_target = Some(target.clone());

    loop {
        set_camera(&render_target_cam);
        clear_background(BLACK);

        let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
        for _ in 0..1000 {
            draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
        }

        for _ in 0..1000 {
            draw_text("wow", rect.x, rect.y, 24.0, WHITE);
        }

        set_default_camera();
        clear_background(BLACK);

        draw_texture_ex(
            &target.texture,
            0.0,
            0.0,
            WHITE,
            DrawTextureParams {
                dest_size: Some(vec2(width, height)),
                flip_y: true,
                ..Default::default()
            },
        );

        next_frame().await;
    }
}

4)

use macroquad::{miniquad::conf::Platform, prelude::*};

fn window_conf() -> Conf {
    Conf {
        window_title: "Letterbox".to_owned(),
        platform: Platform {
            swap_interval: Some(0),
            ..Default::default()
        },
        ..Default::default()
    }
}

#[macroquad::main(window_conf)]
async fn main() {
    let width = screen_width();
    let height = screen_height();
    let target = render_target(width as u32, height as u32);

    let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
    render_target_cam.render_target = Some(target.clone());

    set_camera(&render_target_cam);
    clear_background(BLACK);

    let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
    for _ in 0..1000 {
        draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
    }

    for _ in 0..1000 {
        draw_text("wow", rect.x, rect.y, 24.0, WHITE);
    }
    next_frame().await;

    loop {
        set_default_camera();
        clear_background(BLACK);

        draw_texture_ex(
            &target.texture,
            0.0,
            0.0,
            WHITE,
            DrawTextureParams {
                dest_size: Some(vec2(width, height)),
                flip_y: true,
                ..Default::default()
            },
        );

        next_frame().await;
    }
}

tasogare3710 avatar May 04 '25 00:05 tasogare3710

Example number 1 looks rather interesting. It involves no drawing to a texture at all.

I have debugged the code on my machine. The root cause of the slow-down seems to be the draw_text, actually. It doesn't get batched together with the shape drawcalls. I will try to put together some sort of fix, probably next weekend :)

InnocentusLime avatar May 04 '25 06:05 InnocentusLime

In this example, draw_rectangle seems to perform worse.

0.4.13 0.4.14 note
5 98 - 104 fps 98 - 104 fps
6 333 fps 333-335 fps cpu usage is increasing and gpu usage is decreasing

5

use ::macroquad::{prelude::*, miniquad::conf::Platform};

fn window_conf() -> Conf {
    Conf {
        window_title: "Letterbox".to_owned(),
        platform: Platform {
            swap_interval: Some(0),
            ..Default::default()
        },
        ..Default::default()
    }
}

#[macroquad::main(window_conf)]
async fn main() {
    loop {
        clear_background(BLACK);

        for _ in 0..2000 {
            draw_rectangle(10.0 + 100.0, 50.0, 100.0, 30.0, WHITE);
        }

        next_frame().await;
    }
}

6

use ::macroquad::{prelude::*, miniquad::conf::Platform};

fn window_conf() -> Conf {
    Conf {
        window_title: "Letterbox".to_owned(),
        platform: Platform {
            swap_interval: Some(0),
            ..Default::default()
        },
        ..Default::default()
    }
}

#[macroquad::main(window_conf)]
async fn main() {
    loop {
        clear_background(BLACK);

        for _ in 0..2000 {
            draw_text("wow", 10.0, 50.0, 24.0, WHITE);
        }

        next_frame().await;
    }
}

tasogare3710 avatar May 05 '25 06:05 tasogare3710

In this example, draw_rectangle seems to perform worse. 0.4.13 0.4.14 note 5 98 - 104 fps 98 - 104 fps 6 333 fps 333-335 fps cpu usage is increasing and gpu usage is decreasing

5

use ::macroquad::{prelude::*, miniquad::conf::Platform};

fn window_conf() -> Conf { Conf { window_title: "Letterbox".to_owned(), platform: Platform { swap_interval: Some(0), ..Default::default() }, ..Default::default() } }

#[macroquad::main(window_conf)] async fn main() { loop { clear_background(BLACK);

    for _ in 0..2000 {
        draw_rectangle(10.0 + 100.0, 50.0, 100.0, 30.0, WHITE);
    }

    next_frame().await;
}

}

6

use ::macroquad::{prelude::*, miniquad::conf::Platform};

fn window_conf() -> Conf { Conf { window_title: "Letterbox".to_owned(), platform: Platform { swap_interval: Some(0), ..Default::default() }, ..Default::default() } }

#[macroquad::main(window_conf)] async fn main() { loop { clear_background(BLACK);

    for _ in 0..2000 {
        draw_text("wow", 10.0, 50.0, 24.0, WHITE);
    }

    next_frame().await;
}

}

The draw_rectangle measurements look off. I have 1000FPS on the test 5 under release :)

I have debugged test 5 and macroquad successfully batches all of that into 3 draw calls. I am afraid it is just your hardware.

Moreover, it seems to be higher than 60FPS anyway, which us the standard. Is there any reason to demand to performance to be higher?

InnocentusLime avatar May 05 '25 09:05 InnocentusLime

I also see no problem in higher CPU usage when drawing text. Simply drawing text symbol-by-symbol (which is what macroquad does) IS a more costly operation. Is something wrong with that?

InnocentusLime avatar May 05 '25 09:05 InnocentusLime