alpaka icon indicating copy to clipboard operation
alpaka copied to clipboard

gcc-12: Static analysis fails in release mode with activated OpenACC back-end

Open j-stephan opened this issue 2 years ago • 13 comments

While working on #1713 I encountered the following esoteric bug (which is likely a gcc-12 bug):

When compiling the test cases in release mode with an activated OpenACC back-end (+ serial back-end), this portion of our code base will trigger the -Walloc-zero warning:

https://github.com/alpaka-group/alpaka/blob/e76b69b16b79bcc661811f6bbe511193b532b529/include/alpaka/mem/alloc/Traits.hpp#L36-L42

This actually fails upon instantation for the serial back-end. For some reason the C++ front-end believes that sizeElems is 0. When adding a std::cout to print the value of sizeElems things work again. Exchanging OpenACC with OpenMP5 also works. Compiling in Debug and RelWithDebInfo also works. This leads me to believe that there is a combination of OpenACC flags & optimization flags that irritates gcc's static analyzer.

Unfortunately I haven't yet been able to create a minimal reproducer for the gcc developers. The reproducer for alpaka developers is the integ/sharedMem test case.

j-stephan avatar Jun 24 '22 10:06 j-stephan

Okay, it looks like the above code portion isn't necessarily the culprit. Adding std::cout << sizeElems anywhere in the callchain of buffer allocations will cause the compiler to work correctly. I still don't have any idea on how to reproduce this, though.

j-stephan avatar Jun 29 '22 08:06 j-stephan

Does it make any difference if you pass sizeElems by value instead of by const& ?

fwyzard avatar Jun 29 '22 08:06 fwyzard

Nope, unfortunately not.

j-stephan avatar Jun 29 '22 08:06 j-stephan

Sounds like a problem with an optimization. I believe, the std::cout causes, that some code is not optimized and maybe removed. Did you try different optimization level, like -O2?

SimeonEhrig avatar Jun 29 '22 09:06 SimeonEhrig

Good point. -O2 works. So now I can check the additional -O3 flags to narrow it down.

j-stephan avatar Jun 29 '22 09:06 j-stephan

Interesting. -O3 fails, but -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides (which are the additional flags according to the gcc documentation) works.

j-stephan avatar Jun 29 '22 09:06 j-stephan

Interesting. -O3 fails, but -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides (which are the additional flags according to the gcc documentation) works.

Does they are all optimizations like -O3 or do you omit one more?

SimeonEhrig avatar Jun 29 '22 09:06 SimeonEhrig

Those are the ones enabled locally on my system when passing -O3 instead of -O2. I obtained those by looking at the diff of g++ -O3 -Q --help-optimizers and the -O2 equivalent. This yields the flags mentioned above and additionally -funroll-completely-grow-size which is not valid for C++.

j-stephan avatar Jun 29 '22 09:06 j-stephan

Okay, I know also, that the order of the optimizations is important to achieve a better performance. I'm not sure if the order of the optimizations triggers the bug and if you can control the order by the order of the arguments.

SimeonEhrig avatar Jun 29 '22 09:06 SimeonEhrig

No, according to StackOverflow the order of optimization options does not affect the order in which the optimizations are run in the end.

j-stephan avatar Jun 29 '22 09:06 j-stephan

I now added a work-around in #1754. I am forcing the buffer allocation function to the -O2 level when using gcc 12, OpenACC and release mode. Hopefully we can create a small reproducer in the future so we can notify the gcc developers.

j-stephan avatar Jun 29 '22 10:06 j-stephan

Interestingly, the other way around doesn't work either:

-O3 -fno-gcse-after-reload -fno-ipa-cp-clone -fno-loop-interchange -fno-loop-unroll-and-jam -fno-peel-loops -fno-predictive-commoning -fno-split-loops -fno-split-paths -fno-tree-loop-distribution -fno-tree-partial-pre -fno-unswitch-loops -fvect-cost-model=very-cheap -fno-version-loops-for-strides

This should be equivalent to -O2 since all (documented) -O3 flags are turned off. It still fails. So I guess there is an undocumented optimization which causes the error.

j-stephan avatar Jun 29 '22 13:06 j-stephan

I just stumbled upon this comnet in the GCC wiki FAQ page:

Is -O1 (-O2,-O3, -Os or -Og) equivalent to individual -foptimization options?

No. First, individual optimization options (-f*) do not enable optimization, one of the options -Os, -Og or -Ox with x > 0 is required for any optimization to happen. Second, the -Ox flags enable many optimizations that are not controlled by any individual -f* option. There are no plans to add individual options for controlling all these optimizations. You may find the output of gcc -help=optimizers helpful here, though it too needs to be interpreted with the above caveat in mind.

fwyzard avatar Aug 18 '22 18:08 fwyzard

Apparently this has been fixed in GCC. The following configuration compiled on my local system with GCC 12.2:

cmake .. -Dalpaka_CXX_STANDARD=17 -DBUILD_TESTING=ON -Dalpaka_ACC_ANY_BT_OACC_ENABLE=ON -DCMAKE_BUILD_TYPE=Release

I also tested this with C++20. I'll submit a PR that adds a CI job with OpenACC which will close this issue.

j-stephan avatar Dec 12 '22 11:12 j-stephan