alpaka
alpaka copied to clipboard
gcc-12: Static analysis fails in release mode with activated OpenACC back-end
While working on #1713 I encountered the following esoteric bug (which is likely a gcc-12 bug):
When compiling the test cases in release mode with an activated OpenACC back-end (+ serial back-end), this portion of our code base will trigger the -Walloc-zero
warning:
https://github.com/alpaka-group/alpaka/blob/e76b69b16b79bcc661811f6bbe511193b532b529/include/alpaka/mem/alloc/Traits.hpp#L36-L42
This actually fails upon instantation for the serial back-end. For some reason the C++ front-end believes that sizeElems
is 0
. When adding a std::cout
to print the value of sizeElems
things work again. Exchanging OpenACC with OpenMP5 also works. Compiling in Debug
and RelWithDebInfo
also works. This leads me to believe that there is a combination of OpenACC flags & optimization flags that irritates gcc's static analyzer.
Unfortunately I haven't yet been able to create a minimal reproducer for the gcc developers. The reproducer for alpaka developers is the integ/sharedMem
test case.
Okay, it looks like the above code portion isn't necessarily the culprit. Adding std::cout << sizeElems
anywhere in the callchain of buffer allocations will cause the compiler to work correctly. I still don't have any idea on how to reproduce this, though.
Does it make any difference if you pass sizeElems
by value instead of by const&
?
Nope, unfortunately not.
Sounds like a problem with an optimization. I believe, the std::cout
causes, that some code is not optimized and maybe removed.
Did you try different optimization level, like -O2
?
Good point. -O2
works. So now I can check the additional -O3
flags to narrow it down.
Interesting. -O3
fails, but -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides
(which are the additional flags according to the gcc documentation) works.
Interesting.
-O3
fails, but-O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides
(which are the additional flags according to the gcc documentation) works.
Does they are all optimizations like -O3
or do you omit one more?
Those are the ones enabled locally on my system when passing -O3
instead of -O2
. I obtained those by looking at the diff of g++ -O3 -Q --help-optimizers
and the -O2
equivalent. This yields the flags mentioned above and additionally -funroll-completely-grow-size
which is not valid for C++.
Okay, I know also, that the order of the optimizations is important to achieve a better performance. I'm not sure if the order of the optimizations triggers the bug and if you can control the order by the order of the arguments.
No, according to StackOverflow the order of optimization options does not affect the order in which the optimizations are run in the end.
I now added a work-around in #1754. I am forcing the buffer allocation function to the -O2
level when using gcc 12, OpenACC and release mode. Hopefully we can create a small reproducer in the future so we can notify the gcc developers.
Interestingly, the other way around doesn't work either:
-O3 -fno-gcse-after-reload -fno-ipa-cp-clone -fno-loop-interchange -fno-loop-unroll-and-jam -fno-peel-loops -fno-predictive-commoning -fno-split-loops -fno-split-paths -fno-tree-loop-distribution -fno-tree-partial-pre -fno-unswitch-loops -fvect-cost-model=very-cheap -fno-version-loops-for-strides
This should be equivalent to -O2
since all (documented) -O3
flags are turned off. It still fails. So I guess there is an undocumented optimization which causes the error.
I just stumbled upon this comnet in the GCC wiki FAQ page:
Is -O1 (-O2,-O3, -Os or -Og) equivalent to individual -foptimization options?
No. First, individual optimization options (
-f*
) do not enable optimization, one of the options-Os
,-Og
or-Ox
withx > 0
is required for any optimization to happen. Second, the-Ox
flags enable many optimizations that are not controlled by any individual-f
* option. There are no plans to add individual options for controlling all these optimizations. You may find the output ofgcc -help=optimizers
helpful here, though it too needs to be interpreted with the above caveat in mind.
Apparently this has been fixed in GCC. The following configuration compiled on my local system with GCC 12.2:
cmake .. -Dalpaka_CXX_STANDARD=17 -DBUILD_TESTING=ON -Dalpaka_ACC_ANY_BT_OACC_ENABLE=ON -DCMAKE_BUILD_TYPE=Release
I also tested this with C++20. I'll submit a PR that adds a CI job with OpenACC which will close this issue.