alidist icon indicating copy to clipboard operation
alidist copied to clipboard

Bump GCC to the latest version

Open ktf opened this issue 2 years ago • 3 comments

ktf avatar Nov 22 '22 09:11 ktf

@davidrohr @pzhristov @aalkin as we discussed. This hopefully will allow us to bump to arrow 10 on linux.

ktf avatar Nov 22 '22 09:11 ktf

@davidrohr @shahor02 is /System/Volumes/Data/build/alice-ci-workdir/alidist-o2/sw/SOURCES/O2/4678/0/Detectors/CTF/test/test_ctf_io_tpc.cxx:134: [1;31;49merror: in "CTFTest": check memcmp(vecIn.data(), bVec.data(), bVec.size()) == 0 has failed

known?

ktf avatar Nov 22 '22 13:11 ktf

For progress on FLPs see: OCONF-720

awegrzyn avatar Nov 23 '22 15:11 awegrzyn

Seems like build errors in O2 are caused by some flag changes/improvements in GCC.

In GCC 11, -Wall now includes -Wrange-loop-construct, which is causing some build errors due to -Werror.

GCC 11 also enhanced -Wmaybe-uninitialized, so that finds a few new cases now.

I'm not sure what's wrong with the GPU build.

TimoWilken avatar Jan 18 '23 14:01 TimoWilken

I'm not sure what's wrong with the GPU build.

Hm, not sure, since it worked for me locally. But we want to bump ROCm anyway soon. Will create a new container for that then. Hopefully it will solve the compilation issues.

davidrohr avatar Jan 18 '23 14:01 davidrohr

@davidrohr apart from the DataDistribution issues there seems to be an error when compiling GPU code via hipcc:

: && /opt/rocm/bin/hipcc -fPIC -O2 -std=c++17 -fgpu-defer-diag -mllvm -amdgpu-enable-lower-module-lds=false -Wno-invalid-command-line-argument -Wno-unused-command-line-argument -Wno-invalid-constexpr -Wno-ignored-optimization-argument -Wno-unused-private-field --amdgpu-target=gfx906 -fgpu-flush-denormals-to-zero -fgpu-rdc -O2 -g -DNDEBUG -Wno-unknown-warning-option --amdgpu-target=gfx906 GPU/GPUbenchmark/hip/CMakeFiles/O2exe-gpu-memory-benchmark-hip.dir/benchmark.hip.cxx.o GPU/GPUbenchmark/hip/CMakeFiles/O2exe-gpu-memory-benchmark-hip.dir/Kernels.hip.cxx.o -o stage/bin/o2-gpu-memory-benchmark-hip  -Wl,-rpath,/sw/slc8_x86-64/boost/v1.75.0-local1/lib:/sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib:/opt/rocm/lib:::::::::::::::::::::::::  /sw/slc8_x86-64/boost/v1.75.0-local1/lib/libboost_program_options.so.1.75.0  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libTree.so.6.26.10  /opt/rocm/lib/libamdhip64.so.5.1.50102  /opt/rocm/llvm/lib/clang/14.0.0/lib/linux/libclang_rt.builtins-x86_64.a  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libMathCore.so.6.26.10  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libImt.so.6.26.10  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libMultiProc.so.6.26.10  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libNet.so.6.26.10  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libRIO.so.6.26.10  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libThread.so.6.26.10  -lpthread  /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libCore.so.6.26.10 && :
ld.lld: error: /sw/slc8_x86-64/boost/v1.75.0-local1/lib/libboost_program_options.so.1.75.0: undefined reference to std::__throw_bad_array_new_length()@GLIBCXX_3.4.29 [--no-allow-shlib-undefined]
ld.lld: error: /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libTree.so.6.26.10: undefined reference to std::__throw_bad_array_new_length()@GLIBCXX_3.4.29 [--no-allow-shlib-undefined]
ld.lld: error: /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libTree.so.6.26.10: undefined reference to std::__istream_extract(std::istream&, char*, long)@GLIBCXX_3.4.29 [--no-allow-shlib-undefined]
ld.lld: error: /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libMathCore.so.6.26.10: undefined reference to std::__throw_bad_array_new_length()@GLIBCXX_3.4.29 [--no-allow-shlib-undefined]
ld.lld: error: /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libThread.so.6.26.10: undefined reference to std::condition_variable::wait(std::unique_lock<std::mutex>&)@GLIBCXX_3.4.30 [--no-allow-shlib-undefined]
ld.lld: error: /sw/slc8_x86-64/ROOT/v6-26-10-alice5-local1/lib/libThread.so.6.26.10: undefined reference to std::__throw_bad_array_new_length()@GLIBCXX_3.4.29 [--no-allow-shlib-undefined]
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
[513/4119] Building CXX object Utilities/DataSampling/CMakeFiles/O2lib-DataSampling.dir/src/DataSamplingHeader.cxx.o
[514/4119] Building CXX object DataFormats/common/CMakeFiles/O2test-commondataformat-AbstractRefAccessor.dir/test/testAbstractRefAccessor.cxx.o
[515/4119] Building CXX object Framework/GUISupport/CMakeFiles/O2test-framework-CustomGUISokol.dir/test/test_CustomGUISokol.cxx.o

does it ring any bell?

ktf avatar Jan 19 '23 07:01 ktf

does it ring any bell?

no, as I have already written above, for me locally it does not fail :). But we will bump ROCm soon on the EPNs, then I'll bump the ROCm in the container and hope that it'll fix it. Otherwise we'll need to check in more detail. But I'm following it up.

davidrohr avatar Jan 19 '23 09:01 davidrohr

All's good on FLP side. What's the next step to have it merged?

awegrzyn avatar Feb 01 '23 10:02 awegrzyn

All's good on FLP side. What's the next step to have it merged?

We have to bump ROCm to 5.3 on the EPN farm and in the FullCI container and then retry, and if it still does not work understand why and fix it.

davidrohr avatar Feb 01 '23 10:02 davidrohr

Okay, any ETA? We would before to do it be fore shifts start

awegrzyn avatar Feb 03 '23 08:02 awegrzyn

for reference, it also fails with ROCm 5.3. Not sure what to do now. Could test with ROCm 5.4 on Alma Linux 8.7, but until we are there, it is still quite some time...

davidrohr avatar Feb 03 '23 18:02 davidrohr

OK, fix for ROCm is here: https://github.com/AliceO2Group/AliceO2/pull/10692 @ktf : could you rebase this PR?

davidrohr avatar Feb 03 '23 19:02 davidrohr

I resolved the conflicts of this PR. The O2 fix is merged. In my docker container, it built successfully now. So now the CI should hopefully pass.

davidrohr avatar Feb 04 '23 07:02 davidrohr

ok, it seems we need ROCm 5.3 in addition to my fix. This will be rolled out today on the EPNs, then we can update the containers.

There were also some errors in O2Physics, for which I just opened a PR.

davidrohr avatar Feb 06 '23 08:02 davidrohr

@TimoWilken Can you cache the PR so that we can then merge it?

ktf avatar Feb 08 '23 10:02 ktf