vogl icon indicating copy to clipboard operation
vogl copied to clipboard

vogl_entrypoints.cpp memory hog in 32-bit mode

Open stativ opened this issue 10 years ago • 9 comments

Summary Compiling vogl_entrypoints.cpp in 32-bit mode on a 64-bit machine eats up all memory before the compile is killed by OOM killer. This happens with gcc 4.8.2.

Details Compiling vogl_entrypoints.cpp in 64-bit mode takes a lot of time, but otherwise it compiles fine. However, when compiling with gcc -m32 it eats ~14GB of memory before gcc is killed. It has been suggested [1] that passing -fno-var-tracking should fix this issue (as it is already used in the sources).

However, it seems that gcc ignores this option. This can be tested in a following way:

  1. remove -O2 from the compiler command line
  2. get a list of what optimizations are enabled with -O1: g++ -Q -O1 --help=optimizers | grep enabled | cut -d"[" -f1 | tr -d '\n'
  3. test compilation while manually enabling optmizations:
    1. pass all optimizations from the list to gcc when compiling entrypoints => compilation gets stuck
    2. pass all optimizations except -fvar-tracking => everything compiles fine
    3. pass -O1 -fno-var-tracking => compilation gets stuck

Solution So far I found only one possible solution and that is disabling the optimizations for the said file completely, ie. replace -fno-var-tracking in src/voglcommon/CMakeLists.txt with -O0

References [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59868 [2] https://github.com/ValveSoftware/vogl/pull/9

stativ avatar Mar 26 '14 13:03 stativ

In steps 3.i and 3.ii most of the optimizations are not enabled, since you're not passing -O on the command line. This is a frequent "paper cut" with GCC command line; see http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options:

Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.

The difference between 3.i and 3.ii suggests that var-tracking is at fault (known issue), and the difference between 3.i and 3.iii suggests that something else is also at fault even when var-tracking is disabled.

Can you provide gzipped preprocessed code and exact compilation command line as Mike did in 59868?

amonakov avatar Mar 26 '14 14:03 amonakov

While trying to narrow down the problematic flag I found out that it can be fixed by removing -fvisibility=hidden from the compiler command line

stativ avatar Mar 26 '14 17:03 stativ

Problematic file: http://ge.tt/48kG3kT1/v/0?c

Minimum compilation commands working: g++ -m32 -fPIC -O1 -DNDEBUG foo.cpp working: g++ -m32 -fvisibility=hidden -fPIC -O0 -DNDEBUG foo.cpp broken: g++ -m32 -fvisibility=hidden -fPIC -O1 -DNDEBUG foo.cpp

stativ avatar Mar 26 '14 17:03 stativ

@stativ Can you put the file up in a gist.

kingtaurus avatar Mar 26 '14 17:03 kingtaurus

@kingtaurus The file is almost 9 megabytes, so it's not really suited for posting as plaintext

I was able to narrow the problem down to these flags: -fno-tree-ccp -fno-tree-dominator-opts -fno-tree-forwprop -fno-tree-fre removing any of these results in gcc eating massive amounts of memory.

stativ avatar Mar 26 '14 18:03 stativ

Oh well, the reality is again different than an artificial test. With -O2 it still eats all of my memory. So I guess the real culprit is not the optimization itself but -fvisibility=hidden If I use -fvisibility=default for vogl_entrypoints.cpp, it always works.

stativ avatar Mar 26 '14 19:03 stativ

-fvisibility=default constraints some optimizations.

Did a few tests with 4.8.1 at -O1. RTL DSE and postreload cse seem to be responsible for the huge memory consumption, -fno-dse -fdbg-cnt=postreload_cse:0 is a workaround for that. Still need to check what happens on trunk and make a GCC bug report.

On the other hand, I wonder what the macro-expanded code in that file is supposed to achieve. Would be nice to look for more efficient solutions.

amonakov avatar Mar 26 '14 21:03 amonakov

Trunk still needs -fno-dse, but postreload cse seems to be improved a bit; still consumes a lot of memory, but does not explode like on 4.8.1.

amonakov avatar Mar 27 '14 12:03 amonakov

I was able to repro this on gcc--I think most of us use clang internally, which is why we haven't been bumping into this.

vMcJohn avatar Apr 22 '14 21:04 vMcJohn