Jan Stephan

Results 232 comments of Jan Stephan

Good point. `-O2` works. So now I can check the additional `-O3` flags to narrow it down.

Interesting. `-O3` fails, but `-O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides` (which are the additional flags according to the gcc documentation) works.

Those are the ones enabled locally on my system when passing `-O3` instead of `-O2`. I obtained those by looking at the diff of `g++ -O3 -Q --help-optimizers` and the...

No, according to StackOverflow the order of optimization options does not affect the order in which the optimizations are run in the end.

I now added a work-around in #1754. I am forcing the buffer allocation function to the `-O2` level when using gcc 12, OpenACC and release mode. Hopefully we can create...

Interestingly, the other way around doesn't work either: ``` -O3 -fno-gcse-after-reload -fno-ipa-cp-clone -fno-loop-interchange -fno-loop-unroll-and-jam -fno-peel-loops -fno-predictive-commoning -fno-split-loops -fno-split-paths -fno-tree-loop-distribution -fno-tree-partial-pre -fno-unswitch-loops -fvect-cost-model=very-cheap -fno-version-loops-for-strides ``` This should be equivalent to `-O2`...

So what is the best way here? Can we work around this bug or do we have to wait until we drop support for LLVM < 13?

Okay. This shouldn't be too hard to integrate so it can still land in 0.8.

@SimeonEhrig If I understand the original issue at cling correctly, we need to use clang 13 for compilation and nvcc as linker to make this work with clang. Is this...