julia icon indicating copy to clipboard operation
julia copied to clipboard

Bump LLVM to v17

Open mofeing opened this issue 5 months ago • 51 comments

mofeing avatar Jan 26 '24 13:01 mofeing

We've just upgrade to LLVM 16, 1 or 2 weeks ago (#51720). That was a long trip, so good luck.

Welcome to the Julia community.

inkydragon avatar Jan 26 '24 18:01 inkydragon

We've just upgrade to LLVM 16, 1 or 2 weeks ago (#51720). That was a long trip, so good luck.

Yes, I know. I was invited by @gbaraldi and @giordano to try this since getting LLVM_full_jll 17 to work was super fast.

Welcome to the Julia community.

Thanks! :)

mofeing avatar Jan 28 '24 11:01 mofeing

@gbaraldi would you mind checking 66c13ace24043e3e88b0e9c5f1cd3f5ea21a5cfb?

mofeing avatar Jan 28 '24 11:01 mofeing

I just found out https://github.com/llvm/llvm-project/blob/23b233c8adad5b81e185e50d04356fab64c2f870/llvm/docs/OpaquePointers.rst#migration-instructions

The following commits might not be completely correct:

  • 328ca1a28a45b26725214d8ac33d6127ee7d6867
  • 439f3f155dc2c6ced42bdf7813226b9f50cac142
  • 8f31fba29b06b91d573771db16882bc5cc1b0a47
  • 43c10291fe7aedca65b8917bde4eab037b489f0d

If we revise these commits and fix the remaining deprecation warnings (which I believe are all related to the Opaque pointers transition), then builds for all platform should succeed.

mofeing avatar Jan 28 '24 12:01 mofeing

https://github.com/JuliaLang/julia/commit/66c13ace24043e3e88b0e9c5f1cd3f5ea21a5cfb seems fine, it seems they've changed the order these things need to happen, you can maybe even move the FAM down since it's no longer an argument so it's together with the other analysis managers

gbaraldi avatar Jan 28 '24 12:01 gbaraldi

diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index c784727c4f5..ff3c55bc072 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -4,6 +4,10 @@ namespace JL_I {
 #include "intrinsics.h"
 }
 
+#include <array>
+#include <bitset>
+#include <string>
+
 #include "ccall.cpp"
 
 //Mark our stats as being from intrinsics irgen

fixes compilation of src/codegen.cpp for me on gnu linux.

giordano avatar Jan 28 '24 17:01 giordano

@mofeing thank you for your work! One note from my side is that we aim to support the lastest LLVM version used by the latest Julia release for 1.10 that would be LLVM15. The goal is to be able to build for LLVM 15 without too many issues to be able to check where regressions came from.

This means that you may need to guard your changes under a JL_LLVM_VERSION >= 170000

vchuravy avatar Jan 29 '24 18:01 vchuravy

@mofeing thank you for your work! One note from my side is that we aim to support the lastest LLVM version used by the latest Julia release for 1.10 that would be LLVM15. The goal is to be able to build for LLVM 15 without too many issues to be able to check where regressions came from.

This means that you may need to guard your changes under a JL_LLVM_VERSION >= 170000

Understood! I've refactored the code to include some JL_LLVM_VERSION guards.

mofeing avatar Jan 29 '24 23:01 mofeing

I managed to fix the build for i686! There are just 2 build errors remaining:

  • FIXED ~~This one appears in assert builds (x86_64-linux-gnuassert and powerpc64le-linux-gnuassert) and llvmpasses:~~
```make julia: /workspace/srcdir/llvm-project/llvm/lib/IR/Constants.cpp:1484: llvm::Constant* llvm::ConstantExpr::getWithOperands(llvm::ArrayRef<:constant>, llvm::Type*, bool, llvm::Type*) const: Assertion `SrcTy || (Ops[0]->getType() == getOperand(0)->getType())' failed. [11094] signal 6 (-6): Aborted in expression starting at none:0 Allocations: 23086645 (Pool: 23086341; Big: 304); GC: 25 Aborted *** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. *** make[1]: *** [sysimage.mk:96: /cache/build/builder-amdci4-3/julialang/julia-master/usr/lib/julia/sys-o.a] Error 1 make: *** [Makefile:109: julia-sysimg-release] Error 2 ```
  • This one appears in asan:
ERROR: Unable to load dependent library /cache/build/default-jelaqua-0/julialang/julia-master/tmp/test-asan/asan/usr/bin/../lib/libjulia-internal-debug.so.1.11
Message:/lib/x86_64-linux-gnu/libgcc_s.so.1: version `GCC_13.0.0' not found (required by /cache/build/default-jelaqua-0/julialang/julia-master/tmp/test-asan/asan/usr/bin/../lib/libjulia-internal-debug.so.1.11)

Also, x86_64-w64-mingw32, aarch64-apple-darwin and x86_64-unknown-freebsd are failing with a big segmentation fault when testing ccall:

Executing tests that run on node 1 only:
ccall                                            (1) |        started at 2024-01-30T06:16:59.068
[95758] signal 11 (2): Segmentation fault: 11
in expression starting at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-master/julia-63db41f1c5/share/julia/test/ccall.jl:785
_ZN4llvm13MemCpyOptPass13processMemCpyEPNS_10MemCpyInstERNS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb0EEE at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-master/julia-63db41f1c5/lib/julia/libLLVM.dylib (unknown line)

These errors surpass my knowledge so I don't know how to continue.

mofeing avatar Jan 30 '24 14:01 mofeing

The assertion failure is probably the most pressing.

gbaraldi avatar Jan 30 '24 14:01 gbaraldi

For the issue during ASAN job:

    JULIA tmp/test-asan/asan/usr/lib/julia/corecompiler.ji
ERROR: Unable to load dependent library /cache/build/builder-amdci4-6/julialang/julia-master/tmp/test-asan/asan/usr/bin/../lib/libjulia-internal-debug.so.1.11
Message:/lib/x86_64-linux-gnu/libgcc_s.so.1: version `GCC_13.0.0' not found (required by /cache/build/builder-amdci4-6/julialang/julia-master/tmp/test-asan/asan/usr/bin/../lib/libjulia-internal-debug.so.1.11)
make[1]: *** [/cache/build/builder-amdci4-6/julialang/julia-master/sysimage.mk:64: /cache/build/builder-amdci4-6/julialang/julia-master/tmp/test-asan/asan/usr/lib/julia/corecompiler.ji] Error 1
make[1]: Leaving directory '/cache/build/builder-amdci4-6/julialang/julia-master/tmp/test-asan/asan'
make: *** [/cache/build/builder-amdci4-6/julialang/julia-master/Makefile:103: julia-sysimg-ji] Error 2
make: Leaving directory '/cache/build/builder-amdci4-6/julialang/julia-master/tmp/test-asan/asan'

Feels like something is missing RPATH settings. For reference, look at the x86_64-linux-gnu build:

 gcc -m64 -std=gnu11 -pipe -fPIC -fno-strict-aliasing -D_FILE_OFFSET_BITS=64 -fno-gnu-unique -I/cache/build/tester-amdci5-9/julialang/julia-master/src -I/cache/build/tester-amdci5-9/julialang/julia-master/src -I/cache/build/tester-amdci5-9/julialang/julia-master/src/support -I/cache/build/tester-amdci5-9/julialang/julia-master/usr/include -ffreestanding -DGLIBCXX_LEAST_VERSION_SYMBOL=\"GLIBCXX_3.4.33\" -O3 -ggdb2 -falign-functions -momit-leaf-frame-pointer -DDEP_LIBS="\"libgcc_s.so.1:libopenlibm.so:@libstdc++.so.6:@libjulia-internal.so.1.11:@libjulia-codegen.so.1.11:\"" ./loader_exe.o -o /cache/build/tester-amdci5-9/julialang/julia-master/usr/bin/julia -Wl,-Bdynamic -Wl,-no-undefined -ffreestanding -L/cache/build/tester-amdci5-9/julialang/julia-master/usr/lib -L/cache/build/tester-amdci5-9/julialang/julia-master/usr/lib -Wl,--no-as-needed -ldl -lpthread -rdynamic -lc -Wl,--as-needed -Wl,-z,notext -Wl,-rpath,'$ORIGIN/../lib' -Wl,-rpath,'$ORIGIN/../lib/julia' -Wl,-rpath-link,/cache/build/tester-amdci5-9/julialang/julia-master/usr/lib -Wl,-z,origin -Wl,--enable-new-dtags -ljulia

When the julia executable is linked, the rpath is set correctly. But I don't know what's going on in the ASAN build, because that's not built in verbose mode. That's an issue in the CI setup.

giordano avatar Feb 01 '24 12:02 giordano

@gbaraldi the segmentation fault in ccall test seems to come from llvm::MemCpyOptPass::processMemCpy. Maybe related to https://github.com/llvm/llvm-project/issues/71183?

EDIT: I've narrowed down which commit solves this (in trunk, llvm/llvm-project#71183) and seems like https://github.com/llvm/llvm-project/commit/5c3beb7b1e26d38b0933a28432dfbce4e00cf329. Should we ask LLVM people to cherry-pick to release/17.x or should we cherry-pick ourselves in Julia's LLVM fork?

mofeing avatar Mar 04 '24 10:03 mofeing

or should we cherry-pick ourselves in Julia's LLVM fork?

I don't know what's LLVM policy for backporting fixes, but we definitely usually backport them in https://github.com/JuliaLang/llvm-project if important for Julia.

giordano avatar Mar 04 '24 10:03 giordano

Oh that didn't get in? We have that patch in LLVM 16 so we need to add it to our fork of 17.

gbaraldi avatar Mar 04 '24 13:03 gbaraldi

Just backported https://github.com/llvm/llvm-project/commit/5c3beb7b1e26d38b0933a28432dfbce4e00cf329, tested in local and it works! All tests pass!

Executing tests that run on node 1 only:
ccall                                            (1) |        started at 2024-03-04T17:44:02.738
ccall                                            (1) |    18.43 |   0.16 |  0.8 |     547.49 |   480.39
precompile                                       (1) |        started at 2024-03-04T17:44:21.173

...

Test Summary: |     Pass  Broken     Total      Time
  Overall     | 69374577    3219  69377796  27m43.1s
    SUCCESS

Will relaunch build when https://github.com/JuliaLang/llvm-project/pull/25 is merged and LLVM_jll is updated accordingly.

Now the only problem is the ASAN job 😜

mofeing avatar Mar 04 '24 16:03 mofeing

Some of the ASAN error seems to be coming from this incorrect line existing: https://github.com/JuliaLang/julia/blob/0311aa4caf461d05fb408de67dafc582e2ada9a0/src/jitlayers.cpp#L1592-L1594

vtjnash avatar Mar 04 '24 17:03 vtjnash

Now everything builds! Only thing remaining is the failing ASAN job @gbaraldi @giordano

Btw, about Tier 3 platforms:

  • x86_64-linux-musl builds successfully but crashes into a OOM error when generating the sysimage https://buildkite.com/julialang/julia-master/builds/34384#018e1366-9fdd-4b6d-b095-587646daa398/742-2470
  • aarch64-linux-gnu fails on test job due to a unexpected EOF https://buildkite.com/julialang/julia-master/builds/34384#018e1366-94ca-4991-bf04-1b364572e596/809-973
    • powerpcle64-linux-gnu seems to have several of this EOF errors

mofeing avatar Mar 06 '24 15:03 mofeing

Those are expected:

  • musl is #52707
  • aarch64-linux-gnu is #52434

I have no idea what's the problem with the asan build and I couldn't get any insights from https://github.com/JuliaCI/julia-buildkite/pull/339. @staticfloat would you be able to have a look at https://github.com/JuliaLang/julia/pull/53070#issuecomment-1921241959?

giordano avatar Mar 06 '24 15:03 giordano

@mofeing at https://github.com/JuliaLang/julia/blob/5023ee21d70b734edf206aab3cac7c202ee0235a/sysimage.mk#L65 can you export LD_DEBUG=libs to try and get more information about why the julia process is failing to load the dependencies? If that's not enough you could try LD_DEBUG=all, but the log file will probably be crazy large, but sometimes it's more useful than libs alone.

giordano avatar Mar 14 '24 12:03 giordano

https://buildkite.com/julialang/julia-master/builds/34691#018e3d3a-df06-43e0-8657-e0676f0129d7/723-1505

     17129:	/lib/x86_64-linux-gnu/libstdc++.so.6: error: symbol lookup error: undefined symbol: GLIBCXX_3.4.33 (fatal)
     17128:	find library=libjulia-internal-debug.so.1.12 [0]; searching
     17128:	 search cache=/etc/ld.so.cache
     17128:	 search path=/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/lib:/usr/lib		(system search path)
     17128:	  trying file=/lib/x86_64-linux-gnu/libjulia-internal-debug.so.1.12
     17128:	  trying file=/usr/lib/x86_64-linux-gnu/libjulia-internal-debug.so.1.12
     17128:	  trying file=/lib/libjulia-internal-debug.so.1.12
     17128:	  trying file=/usr/lib/libjulia-internal-debug.so.1.12

So this confirms that libjulia-internal-debug.so.1.12 is trying to be loaded without using rpath, but in https://github.com/JuliaCI/julia-buildkite/pull/339#issuecomment-1980766664 it looked like rpath should be set. I still don't understand what's going on. @staticfloat would you be able to have a look at this? This is the last error we're facing here.

giordano avatar Mar 14 '24 14:03 giordano

Is that this bug: https://github.com/JuliaPackaging/Yggdrasil/issues/3703 or did the rpath of LLVM ever get changed to be set correctly and that issue is stale?

vtjnash avatar Mar 14 '24 14:03 vtjnash

Locally, for a release build, I get

% for x in usr/lib/*.so; do echo $x; objdump -x $x | grep 'RPATH\|RUNPATH'; done
usr/lib/libamd.so
  RPATH                $ORIGIN
usr/lib/libasan.so
  RUNPATH              $ORIGIN
usr/lib/libatomic.so
usr/lib/libblastrampoline.so
usr/lib/libbtf.so
  RPATH                $ORIGIN
usr/lib/libcamd.so
  RPATH                $ORIGIN
usr/lib/libccalllazybar.so
usr/lib/libccalllazyfoo.so
usr/lib/libccalltest.so
usr/lib/libccolamd.so
  RPATH                $ORIGIN
usr/lib/libcholmod.so
  RPATH                $ORIGIN
usr/lib/libcolamd.so
  RPATH                $ORIGIN
usr/lib/libcurl.so
  RUNPATH              $ORIGIN
usr/lib/libdSFMT.so
usr/lib/libgfortran.so
  RUNPATH              $ORIGIN
usr/lib/libgit2.so
  RPATH                $ORIGIN
usr/lib/libgmp.so
usr/lib/libgmpxx.so
  RUNPATH              $ORIGIN
usr/lib/libgomp.so
usr/lib/libhwasan.so
  RUNPATH              $ORIGIN
usr/lib/libitm.so
usr/lib/libjulia-codegen.so
  RUNPATH              $ORIGIN
usr/lib/libjulia-internal.so
  RUNPATH              $ORIGIN
usr/lib/libjulia.so
usr/lib/libklu_cholmod.so
  RPATH                $ORIGIN
usr/lib/libklu.so
  RPATH                $ORIGIN
usr/lib/libldl.so
  RPATH                $ORIGIN
usr/lib/libLLVM-16.0.6jl.so
  RPATH                $ORIGIN/../lib
usr/lib/libLLVM-16jl.so
  RPATH                $ORIGIN/../lib
usr/lib/libllvmcalltest.so
usr/lib/libLLVM.so
  RPATH                $ORIGIN/../lib
usr/lib/liblsan.so
  RUNPATH              $ORIGIN
usr/lib/libmbedcrypto.so
usr/lib/libmbedtls.so
  RUNPATH              $ORIGIN
usr/lib/libmbedx509.so
  RUNPATH              $ORIGIN
usr/lib/libmpfr.so
  RUNPATH              $ORIGIN
usr/lib/libnghttp2.so
usr/lib/libobjc.so
  RUNPATH              $ORIGIN
usr/lib/libopenblas64_.0.3.26.so
  RPATH                $ORIGIN
usr/lib/libopenblas64_.so
  RPATH                $ORIGIN
usr/lib/libopenlibm.so
usr/lib/libpcre2-16.so
usr/lib/libpcre2-32.so
usr/lib/libpcre2-8.so
usr/lib/libpcre2-posix.so
  RUNPATH              $ORIGIN
usr/lib/libquadmath.so
usr/lib/librbio.so
  RPATH                $ORIGIN
usr/lib/libspqr.so
  RPATH                $ORIGIN
usr/lib/libssh2.so
  RUNPATH              $ORIGIN
usr/lib/libssp.so
usr/lib/libstdc++.so
  RUNPATH              $ORIGIN
usr/lib/libsuitesparseconfig.so
  RPATH                $ORIGIN
usr/lib/libtsan.so
  RUNPATH              $ORIGIN
usr/lib/libubsan.so
  RUNPATH              $ORIGIN
usr/lib/libumfpack.so
  RPATH                $ORIGIN
usr/lib/libunwind-coredump.so
  RUNPATH              $ORIGIN
usr/lib/libunwind-generic.so
  RUNPATH              $ORIGIN
usr/lib/libunwind-ptrace.so
  RUNPATH              $ORIGIN
usr/lib/libunwind-setjmp.so
  RUNPATH              $ORIGIN
usr/lib/libunwind.so
  RUNPATH              $ORIGIN
usr/lib/libunwind-x86_64.so
  RUNPATH              $ORIGIN
usr/lib/libuv.so
usr/lib/libz.so

Issue https://github.com/JuliaPackaging/Yggdrasil/issues/3703 may have been fixed since? Although mixing rpath/runpath isn't great.

giordano avatar Mar 14 '24 14:03 giordano

A normal build of the debug version seems to fail on linux with this due to linking issue with __stack_chk_guard. This seems to be caused by https://github.com/llvm/llvm-project/commit/e018cbf7208b3d34f18997ddee84c66cee32fb1b which no longer gets the relocation type from the target machine but the module flags instead.

Setting the PIC flag on the module by adding dataM.setPICLevel(PICLevel::BigPIC); here seems to work.

yuyichao avatar Mar 18 '24 05:03 yuyichao

@nanosoldier runbenchmarks(vs=":master")

gbaraldi avatar Apr 15 '24 20:04 gbaraldi

@nanosoldier runbenchmarks(ALL, vs=":master")

gbaraldi avatar Apr 15 '24 20:04 gbaraldi

@nanosoldier runtests()

gbaraldi avatar Apr 15 '24 21:04 gbaraldi

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier avatar Apr 16 '24 05:04 nanosoldier

The package evaluation job you requested has completed - possible new issues were detected. The full report is available.

nanosoldier avatar Apr 16 '24 16:04 nanosoldier

GC invariant verifier-related aborts should be fixed by https://github.com/JuliaLang/julia/pull/54113.

maleadt avatar Apr 17 '24 13:04 maleadt

diff --git a/src/llvm-alloc-opt.cpp b/src/llvm-alloc-opt.cpp
index b08e18068a..25c686827b 100644
--- a/src/llvm-alloc-opt.cpp
+++ b/src/llvm-alloc-opt.cpp
@@ -1076,7 +1076,7 @@ void Optimizer::splitOnStack(CallInst *orig_inst)
                     store_ty = T_pjlvalue;
                 }
                 else {
-                    store_ty = PointerType::getWithSamePointeeType(T_pjlvalue, store_ty->getPointerAddressSpace());
+                    store_ty = PointerType::get(T_pjlvalue->getContext(), store_ty->getPointerAddressSpace());
                     store_val = builder.CreateBitCast(store_val, store_ty);
                 }
                 if (store_ty->getPointerAddressSpace() != AddressSpace::Tracked)

should fix the build error.

Zentrik avatar Apr 21 '24 21:04 Zentrik