asteroids-demo icon indicating copy to clipboard operation
asteroids-demo copied to clipboard

Compile option

Open icls1337 opened this issue 1 year ago • 2 comments

why -ffast-math -Os ? -Ofast better?

icls1337 avatar Jul 15 '23 00:07 icls1337

Practically speaking, -O0 would be completely sufficient. This program barely does any work itself and spends nearly all its time waiting on Sleep, SwapBuffers, and other OpenGL calls. The -Os is not so much for performance but to trim the binary. Its floating point arithmetic has no strict requirements or expectations, so -ffast-math creates additional opportunities for trimming extraneous instructions.

However, -Ofast implies -O3, and at least with GCC 13, produces an even larger binary than -O0. GCC aggressively unrolls loops trying to squeeze out every drop of performance, bloating the binary as the cost. However, as mentioned, none of the hot spots are actually in this binary, but out in the OpenGL implementation, so it doesn't accomplish anything useful. I'd rather have a smaller binary.

Anticipating a followup question: Why not -Oz instead of -Os? It's a new option (GCC 12, May 2022) and didn't exist yet when I wrote this program (March 2021). I need more experience with it before making a decision.

In case anyone following along notices the /O2 in the MSVC build: /Os has always been effectively broken in MSVC. Despite its label as "favor small code" it typically produces larger code than /O2 — sometimes substantially larger — so I just stick with the latter. I speculate hardly anyone uses /Os, and so nobody notices it's broken. (Note: /Os actually does work in clang-cl.)

skeeto avatar Jul 15 '23 02:07 skeeto

Practically speaking, -O0 would be completely sufficient. This program barely does any work itself and spends nearly all its time waiting on Sleep, SwapBuffers, and other OpenGL calls. The -Os is not so much for performance but to trim the binary. Its floating point arithmetic has no strict requirements or expectations, so -ffast-math creates additional opportunities for trimming extraneous instructions. However, -Ofast implies -O3, and at least with GCC 13, produces an even larger binary than -O0. GCC aggressively unrolls loops trying to squeeze out every drop of performance, bloating the binary as the cost. However, as mentioned, none of the hot spots are actually in this binary, but out in the OpenGL implementation, so it doesn't accomplish anything useful. I'd rather have a smaller binary. Anticipating a followup question: Why not -Oz instead of -Os? It's a new option (GCC 12, May 2022) and didn't exist yet when I wrote this program (March 2021). I need more experience with it before making a decision. In case anyone following along notices the /O2 in the MSVC build: /Os has always been effectively broken in MSVC. Despite its label as "favor small code" it typically produces larger code than /O2 — sometimes substantially larger — so I just stick with the latter. I speculate hardly anyone uses /Os, and so nobody notices it's broken. (Note: /Os actually does work in clang-cl.)

I've found that gcc -Os and -Oz compile out to no more than 0.1% size comparison, but clang can still be more than 10% smaller.

icls1337 avatar Jul 15 '23 05:07 icls1337