Drawing to a texture can be extremely slow
I managed to reduce the issue to the following code. As is, it runs at 3fps on my machine. If I remove any of the three marked lines, I get 80fps+.
Tested against master and 0.4.13 on MacOS. This does NOT happen on 0.4.13, ie. this is a regression.
use macroquad::prelude::*;
#[macroquad::main("Letterbox")]
async fn main() {
let width = screen_width();
let height = screen_height();
let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());
loop {
// Removing this line fixes the problem.
set_camera(&render_target_cam); // <--
clear_background(BLACK);
for _ in 0..1000 {
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
// Removing either of the following lines fixes the problem.
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
}
set_default_camera();
clear_background(BLACK);
draw_texture_ex(
&target.texture,
0.0,
0.0,
WHITE,
DrawTextureParams {
dest_size: Some(vec2(width, height)),
flip_y: true, // Must flip y otherwise 'render_target' will be upside down
..Default::default()
},
);
draw_fps();
next_frame().await;
}
}
@ollej on Discord noticed rearranging the code like follows works as we'd expect, with stable 80fps+.
use macroquad::prelude::*;
#[macroquad::main("Letterbox")]
async fn main() {
let width = screen_width();
let height = screen_height();
let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());
loop {
set_camera(&render_target_cam); // <--
clear_background(BLACK);
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
for _ in 0..1000 {
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
}
for _ in 0..1000 {
draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
}
set_default_camera();
clear_background(BLACK);
draw_texture_ex(
&target.texture,
0.0,
0.0,
WHITE,
DrawTextureParams {
dest_size: Some(vec2(width, height)),
flip_y: true, // Must flip y otherwise 'render_target' will be upside down
..Default::default()
},
);
draw_fps();
next_frame().await;
}
}
Because you are creating Rect 1000 times per frame.
Move Rect::new outside the game loop:
#[macroquad::main("Letterbox")]
async fn main() {
let width = screen_width();
let height = screen_height();
let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
loop {
set_camera(&render_target_cam); // <--
clear_background(BLACK);
for _ in 0..1000 {
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
}
for _ in 0..1000 {
draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
}
set_default_camera();
clear_background(BLACK);
draw_texture_ex(
&target.texture,
0.0,
0.0,
WHITE,
DrawTextureParams {
dest_size: Some(vec2(width, height)),
flip_y: true, // Must flip y otherwise 'render_target' will be upside down
..Default::default()
},
);
draw_text(&get_fps().to_string(), screen_width() - 50., 30.0, 30.0, BLUE);
next_frame().await;
}
}
Because you are creating Rect 1000 times per frame. Move
Rect::newoutside the game loop:
- Rect is so cheap to create that it's not relevant.
- LLVM is smart enough to bring it out (or completely inline here).
- You do not need to touch the rect in my original reproduction to get rid of the perf issue.
Because you are creating Rect 1000 times per frame. Move
Rect::newoutside the game loop:
- Rect is so cheap to create that it's not relevant.
- LLVM is smart enough to bring it out (or completely inline here).
- You do not need to touch the rect in my original reproduction to get rid of the perf issue.
Your code gives me a stable 240 frames with vsync, without it - it gives more. That's why I physically can't test it and I have to look for the problem by eye.
But in 90% of cases such drops are human error, so I would check it a few more times.
Your code gives me a stable 240 frames with vsync, without it - it gives more. That's why I physically can't test it and I have to look for the problem by eye.
Based on your snippet not using draw_fps(), I assume you are using the released version of macroquad, which i explicitly said does not have this issue. You also based your snippet on the version I said works on master too.
Your code gives me a stable 240 frames with vsync, without it - it gives more. That's why I physically can't test it and I have to look for the problem by eye.
Based on your snippet not using
draw_fps(), I assume you are using the released version of macroquad, which i explicitly said does not have this issue. You also based your snippet on the version I said works on master too.
Yes, I tested on 0.4.13, but you didn't specify the version on which this happens.
The problem is also reproduced on Linux(X11 Fedora 40 Gnome). Viewing the framegraph showed that most of the time is spent on the quad_gl draw method and waiting for the iris_dri.so driver to respond (as far as I know, this is the mesa opengl driver for intel). For some reason, it is in this situation that a huge wait for the driver to respond occurs. Whether this is a driver error or something else - I do not know. I am sharing the files with you. The "bad" file is a snapshot with the program from the example problem. The "good" file is a snapshot when I draw text or a rectangle in different cycles.
This is very bad (Note: the x-axis placement doesn't make sense (random placement). The y-axis is the nesting of functions):
This is good(we can see that the rendering functions take up most of the program's time):
I can't say that this would help the issue, but I'd like to point out that (from my understanding) it is REALLY REALLY bad practice to shadow a variable unless necessary (such as to change the type while retaining the variable name). Since you're not changing the type, it may be most beneficial to just allocate the rect variable outside of the main loop. Below is a reply that shows something similar. I really do doubt that not shadowing the variable would help performance, but it is good practice nonetheless.
@ollej on Discord noticed rearranging the code like follows works as we'd expect, with stable 80fps+.
use macroquad::prelude::*;
#[macroquad::main("Letterbox")] async fn main() { let width = screen_width(); let height = screen_height(); let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height)); render_target_cam.render_target = Some(target.clone()); loop { set_camera(&render_target_cam); // <-- clear_background(BLACK); let rect = Rect::new(10.0, 50.0, 100.0, 30.0); for _ in 0..1000 { draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <-- } for _ in 0..1000 { draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <-- } set_default_camera(); clear_background(BLACK); draw_texture_ex( &target.texture, 0.0, 0.0, WHITE, DrawTextureParams { dest_size: Some(vec2(width, height)), flip_y: true, // Must flip y otherwise 'render_target' will be upside down ..Default::default() }, ); draw_fps(); next_frame().await; }}
The original code does not shadow any variables. It'd not be considered bad practice in rust anyhow.
Sorry, you are right about the shadowing part. It was late when I was typing that. Nonetheless, the code can be improved (imo) by adjusting this segment specifically.
for _ in 0..1000 {
let rect = Rect::new(10.0, 50.0, 100.0, 30.0); // <- this can and should be initialized outside of the main loop
// Removing either of the following lines fixes the problem.
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
draw_text("wow", rect.x, rect.y, 24.0, WHITE);
}
This is further justified by the way that you already do that with the width, height, and target. Additionally, here is a stackoverflow answer that further supports my point (of course, that's for the C/C++ compiler, so 🤷).
However, it's up to you on if you want to implement that. I'm just throwing my words of wisdom out here. I'm sure the compiler would recognize that and fix it for you -- but relying on the compiler to fix something that you know you can improve is just outright bad practice in my opinion.
If you do have your reasons for doing that, then it'd be highly helpful if you specified those reasons. :)
@BnDLett You are certainly right about shadowing, but the question was about something else, so your answer is not relevant.
Certainly not relevant. As such, I rest my case.
Viewing the framegraph showed that most of the time is spent on the quad_gl draw method and waiting for the iris_dri.so driver to respond (as far as I know, this is the mesa opengl driver for intel). For some reason, it is in this situation that a huge wait for the driver to respond occurs.
@KurlykovDanila this more than likely means that the graphics side of things is having trouble drawing all stuff in-time
In fact. I get low FPS with this code too on Windows. I also experienced the performance issues on the released 0.13.0 version unlike @JaniM.
However, this performance dip can be reproduced with much simpler code:
use macroquad::prelude::*;
#[macroquad::main("Letterbox")]
async fn main() {
let width = screen_width();
let height = screen_height();
let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());
loop {
clear_background(BLACK);
for _ in 0..1000 {
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
// Removing either of the following lines fixes the problem.
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE); // <--
draw_text("wow", rect.x, rect.y, 24.0, WHITE); // <--
}
draw_fps();
next_frame().await;
}
}
I looked at the code thoroughly and also debugged it. It seems that the main problem here is that this program ends up producing 2000 unbatched drawcalls. One drawcall in macroquad is quite an expensive operation, hence the performance drop.
The 2000 unbatched drawcalls part is caused by interleaving draw_rectangle and draw_text. Hence why when you remove draw_rectangle or draw_text -- the problem disappears. macroquad figures out that they can be grouped into one drawcall.
P.S.
The performance did get somewhat worse THOUGH. I tried to look for a commit that did that and seems that before b0bf24b455d8be3385dc50088277421812a6dfe2 it was all better
It is reproduced in windows. Performance seems to be worse with version 0.4.14. 0.4.13 is fine.
| 0.4.13 | 0.4.14 | |
|---|---|---|
| 1 | 26 fps | 24 fps |
| 2 | 145±2 fps | 145±2 fps |
| 3 | 220 - 230 fps | 155 - 160 fps |
| 4 | min 400, avg 430, max 475 fps | 445 - 465 fps |
- All release profile
- 1 and 2 take time before a primitives is drawn for the first time.
- fps was measured by RenderDoc.
1)
use macroquad::{miniquad::conf::Platform, prelude::*};
fn window_conf() -> Conf {
Conf {
window_title: "Letterbox".to_owned(),
platform: Platform {
swap_interval: Some(0),
..Default::default()
},
..Default::default()
}
}
#[macroquad::main(window_conf)]
async fn main() {
loop {
clear_background(BLACK);
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
for _ in 0..1000 {
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
draw_text("wow", rect.x, rect.y, 24.0, WHITE);
}
next_frame().await;
}
}
2)
use macroquad::{miniquad::conf::Platform, prelude::*};
fn window_conf() -> Conf {
Conf {
window_title: "Letterbox".to_owned(),
platform: Platform {
swap_interval: Some(0),
..Default::default()
},
..Default::default()
}
}
#[macroquad::main(window_conf)]
async fn main() {
loop {
clear_background(BLACK);
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
for _ in 0..1000 {
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
}
for _ in 0..1000 {
draw_text("wow", rect.x, rect.y, 24.0, WHITE);
}
next_frame().await;
}
}
3)
use macroquad::{miniquad::conf::Platform, prelude::*};
fn window_conf() -> Conf {
Conf {
window_title: "Letterbox".to_owned(),
platform: Platform {
swap_interval: Some(0),
..Default::default()
},
..Default::default()
}
}
#[macroquad::main(window_conf)]
async fn main() {
let width = screen_width();
let height = screen_height();
let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());
loop {
set_camera(&render_target_cam);
clear_background(BLACK);
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
for _ in 0..1000 {
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
}
for _ in 0..1000 {
draw_text("wow", rect.x, rect.y, 24.0, WHITE);
}
set_default_camera();
clear_background(BLACK);
draw_texture_ex(
&target.texture,
0.0,
0.0,
WHITE,
DrawTextureParams {
dest_size: Some(vec2(width, height)),
flip_y: true,
..Default::default()
},
);
next_frame().await;
}
}
4)
use macroquad::{miniquad::conf::Platform, prelude::*};
fn window_conf() -> Conf {
Conf {
window_title: "Letterbox".to_owned(),
platform: Platform {
swap_interval: Some(0),
..Default::default()
},
..Default::default()
}
}
#[macroquad::main(window_conf)]
async fn main() {
let width = screen_width();
let height = screen_height();
let target = render_target(width as u32, height as u32);
let mut render_target_cam = Camera2D::from_display_rect(Rect::new(0., 0., width, height));
render_target_cam.render_target = Some(target.clone());
set_camera(&render_target_cam);
clear_background(BLACK);
let rect = Rect::new(10.0, 50.0, 100.0, 30.0);
for _ in 0..1000 {
draw_rectangle(rect.x + 100.0, rect.y, rect.w, rect.h, WHITE);
}
for _ in 0..1000 {
draw_text("wow", rect.x, rect.y, 24.0, WHITE);
}
next_frame().await;
loop {
set_default_camera();
clear_background(BLACK);
draw_texture_ex(
&target.texture,
0.0,
0.0,
WHITE,
DrawTextureParams {
dest_size: Some(vec2(width, height)),
flip_y: true,
..Default::default()
},
);
next_frame().await;
}
}
Example number 1 looks rather interesting. It involves no drawing to a texture at all.
I have debugged the code on my machine. The root cause of the slow-down seems to be the draw_text, actually. It doesn't get batched together with the shape drawcalls. I will try to put together some sort of fix, probably next weekend :)
In this example, draw_rectangle seems to perform worse.
| 0.4.13 | 0.4.14 | note | |
|---|---|---|---|
| 5 | 98 - 104 fps | 98 - 104 fps | |
| 6 | 333 fps | 333-335 fps | cpu usage is increasing and gpu usage is decreasing |
5
use ::macroquad::{prelude::*, miniquad::conf::Platform};
fn window_conf() -> Conf {
Conf {
window_title: "Letterbox".to_owned(),
platform: Platform {
swap_interval: Some(0),
..Default::default()
},
..Default::default()
}
}
#[macroquad::main(window_conf)]
async fn main() {
loop {
clear_background(BLACK);
for _ in 0..2000 {
draw_rectangle(10.0 + 100.0, 50.0, 100.0, 30.0, WHITE);
}
next_frame().await;
}
}
6
use ::macroquad::{prelude::*, miniquad::conf::Platform};
fn window_conf() -> Conf {
Conf {
window_title: "Letterbox".to_owned(),
platform: Platform {
swap_interval: Some(0),
..Default::default()
},
..Default::default()
}
}
#[macroquad::main(window_conf)]
async fn main() {
loop {
clear_background(BLACK);
for _ in 0..2000 {
draw_text("wow", 10.0, 50.0, 24.0, WHITE);
}
next_frame().await;
}
}
In this example,
draw_rectangleseems to perform worse. 0.4.13 0.4.14 note 5 98 - 104 fps 98 - 104 fps 6 333 fps 333-335 fps cpu usage is increasing and gpu usage is decreasing5
use ::macroquad::{prelude::*, miniquad::conf::Platform};
fn window_conf() -> Conf { Conf { window_title: "Letterbox".to_owned(), platform: Platform { swap_interval: Some(0), ..Default::default() }, ..Default::default() } }
#[macroquad::main(window_conf)] async fn main() { loop { clear_background(BLACK);
for _ in 0..2000 { draw_rectangle(10.0 + 100.0, 50.0, 100.0, 30.0, WHITE); } next_frame().await; }}
6
use ::macroquad::{prelude::*, miniquad::conf::Platform};
fn window_conf() -> Conf { Conf { window_title: "Letterbox".to_owned(), platform: Platform { swap_interval: Some(0), ..Default::default() }, ..Default::default() } }
#[macroquad::main(window_conf)] async fn main() { loop { clear_background(BLACK);
for _ in 0..2000 { draw_text("wow", 10.0, 50.0, 24.0, WHITE); } next_frame().await; }}
The draw_rectangle measurements look off. I have 1000FPS on the test 5 under release :)
I have debugged test 5 and macroquad successfully batches all of that into 3 draw calls. I am afraid it is just your hardware.
Moreover, it seems to be higher than 60FPS anyway, which us the standard. Is there any reason to demand to performance to be higher?
I also see no problem in higher CPU usage when drawing text. Simply drawing text symbol-by-symbol (which is what macroquad does) IS a more costly operation. Is something wrong with that?