vogl
vogl copied to clipboard
vogl_entrypoints.cpp memory hog in 32-bit mode
Summary Compiling vogl_entrypoints.cpp in 32-bit mode on a 64-bit machine eats up all memory before the compile is killed by OOM killer. This happens with gcc 4.8.2.
Details
Compiling vogl_entrypoints.cpp in 64-bit mode takes a lot of time, but otherwise it compiles fine. However, when compiling with gcc -m32
it eats ~14GB of memory before gcc is killed. It has been suggested [1] that passing -fno-var-tracking
should fix this issue (as it is already used in the sources).
However, it seems that gcc ignores this option. This can be tested in a following way:
- remove -O2 from the compiler command line
- get a list of what optimizations are enabled with -O1:
g++ -Q -O1 --help=optimizers | grep enabled | cut -d"[" -f1 | tr -d '\n'
- test compilation while manually enabling optmizations:
- pass all optimizations from the list to gcc when compiling entrypoints => compilation gets stuck
- pass all optimizations except -fvar-tracking => everything compiles fine
- pass -O1 -fno-var-tracking => compilation gets stuck
Solution
So far I found only one possible solution and that is disabling the optimizations for the said file completely, ie. replace -fno-var-tracking
in src/voglcommon/CMakeLists.txt with -O0
References [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59868 [2] https://github.com/ValveSoftware/vogl/pull/9
In steps 3.i and 3.ii most of the optimizations are not enabled, since you're not passing -O on the command line. This is a frequent "paper cut" with GCC command line; see http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options:
Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.
The difference between 3.i and 3.ii suggests that var-tracking is at fault (known issue), and the difference between 3.i and 3.iii suggests that something else is also at fault even when var-tracking is disabled.
Can you provide gzipped preprocessed code and exact compilation command line as Mike did in 59868?
While trying to narrow down the problematic flag I found out that it can be fixed by removing -fvisibility=hidden from the compiler command line
Problematic file: http://ge.tt/48kG3kT1/v/0?c
Minimum compilation commands
working:
g++ -m32 -fPIC -O1 -DNDEBUG foo.cpp
working:
g++ -m32 -fvisibility=hidden -fPIC -O0 -DNDEBUG foo.cpp
broken:
g++ -m32 -fvisibility=hidden -fPIC -O1 -DNDEBUG foo.cpp
@stativ Can you put the file up in a gist.
@kingtaurus The file is almost 9 megabytes, so it's not really suited for posting as plaintext
I was able to narrow the problem down to these flags:
-fno-tree-ccp -fno-tree-dominator-opts -fno-tree-forwprop -fno-tree-fre
removing any of these results in gcc eating massive amounts of memory.
Oh well, the reality is again different than an artificial test. With -O2 it still eats all of my memory. So I guess the real culprit is not the optimization itself but -fvisibility=hidden
If I use -fvisibility=default for vogl_entrypoints.cpp, it always works.
-fvisibility=default
constraints some optimizations.
Did a few tests with 4.8.1 at -O1. RTL DSE and postreload cse seem to be responsible for the huge memory consumption, -fno-dse -fdbg-cnt=postreload_cse:0
is a workaround for that. Still need to check what happens on trunk and make a GCC bug report.
On the other hand, I wonder what the macro-expanded code in that file is supposed to achieve. Would be nice to look for more efficient solutions.
Trunk still needs -fno-dse, but postreload cse seems to be improved a bit; still consumes a lot of memory, but does not explode like on 4.8.1.
I was able to repro this on gcc--I think most of us use clang internally, which is why we haven't been bumping into this.