Conflict between MKL_jll and libgomp in CompilerSupportLibraries_jll
Today I was tracking down the test failure observed in https://github.com/JuliaMath/IntelVectorMath.jl/pull/39 when moving to MKL_jll that occurred on a Mac.
After a while, with some help from @giordano, an MWE is:
# dlopen the libgomp inside CompilerSupportLibraries_jll (libgfortran 5).
using Libdl
dlopen("/Users/kristoffercarlsson/.julia/artifacts/411c17f069dd12b3f79b19e4647949050a58223c/lib/libgomp.1.dylib")
using MKL_jll
gamma1(x) = (x = copy(x); ccall((:vdTGamma, libmkl_rt), Cvoid, (Int32, Ptr{Float64}, Ptr{Float64}), length(x), x, x); x);
gamma2(x) = (y = similar(x); ccall((:vdTGamma, libmkl_rt), Cvoid, (Int32, Ptr{Float64}, Ptr{Float64}), length(x), x, y); y);
INPUT_2 = rand(1000);
gamma1(INPUT_2)[1:3]
gamma2(INPUT_2)[1:3]
the result from gamma1 and gamma2 should give the same answers, but they don't, as an example
julia> gamma1(INPUT_2)[1:3]
3-element Array{Float64,1}:
Inf
Inf
Inf
julia> gamma2(INPUT_2)[1:3]
3-element Array{Float64,1}:
16.209046961478283
6.570129954697695
41.08264424860374
(This doesn't seem to happen 100% of the time but the majority of the times).
This only happens if the length of the input vector is a certain size, cf
julia> gamma1(INPUT_2[1:602])[1:3] # ok
3-element Array{Float64,1}:
1.0650782774656853
1.4718294693439533
1.1884483595544906
julia> gamma2(INPUT_2[1:602])[1:3] # ok
3-element Array{Float64,1}:
1.0650782774656853
1.4718294693439533
1.1884483595544906
julia> gamma1(INPUT_2[1:603])[1:3] # bad
3-element Array{Float64,1}:
0.9663910468243027
0.8856476139287297
0.9213177809196278
julia> gamma2(INPUT_2[1:603])[1:3] # ok
3-element Array{Float64,1}:
1.0650782774656853
1.4718294693439533
1.1884483595544906
The cutoff seems to be when the length of the vector is 603. My guess is that MKL thinks this is a long enough vector to perhaps enable threading? Also, it is important that libgomp is loaded before MKL_jll for the bug to occur.
The versions used are:
[e66e0078] CompilerSupportLibraries_jll v0.2.0+1
[856f044c] MKL_jll v2020.0.166+0
and
julia> versioninfo()
Julia Version 1.4.0-rc2.0
Commit b99ed72c95 (2020-02-24 16:51 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.0.0)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Updated to reflect that the problem is the dlopen of libgomp in CompilerSupportLibraries_jll
@tlienart this is almost certainly the issue that you brought up to me recently.
The fundamental issue is that MKL has its own OMP implementation (libiomp) that exports the same symbols as libgomp. If you import CompilerSupportLibraries_jll first, you get libgomp hogging those symbols, while if you load MKL_jll first, then libiomp gets its implementations bound to those symbols.
So one workaround that should work today; call using MKL_jll first. Why won't that break other applications? Because very few other packages are actually using libgomp. We include it in CSL_jll because there are a few (And it's important to provide all the libraries), but it's a little rare for a package to actually use it. We automatically open all libraries within these packages so that dependent packages can open those libraries. While we could prevent CSL_jll from loading libgomp automatically (via the dont_dlopen=true argument) that would break packages actually trying to use libgomp, as there's no way for those dependent packages to find where libgomp is living (from the dynamic linker's perspective; from Julia's perspective of course we can find it, but communicating that to the dynamic linker is the issue).
The basic problem is that we have two libraries that are mutually incompatible; libgomp and libiomp. If you have a single application with dependencies that rely on both, this will never work. However packages like MLJ probably don't really care about libgomp and it's just coming along for the ride via CSL. To fix this, I see a few possibilities:
-
Split
libgompout fromCSL_jll, force packages that actually needlibgompto depend on that rather than theCSL_jllpackage itself. (I'm not really a fan of this, I'd like to keep it all together) -
Switch
libgompto bedont_dlopen=true, tell downstream packages to always manuallydlopen(libgomp)before trying to load other packages. (I don't like this either; we want to be more automatic, not less) -
Teach
BinaryBuilderto inspect all libraries within a prefix, build a tree of all dependencies, and if it finds a recursive dependency that was marked asdont_dlopen, manually dlopen it before trying to load the top-level dependency.
My favorite solution is the third, but of course it's also the most work. :P @giordano I'm open to suggestions.
@staticfloat as a heads up, this is still bugging us and using MKL_jll does not fix the issue (see e.g. https://travis-ci.com/github/alan-turing-institute/MLJModels.jl/jobs/320357986). I'm still at loss why MLJ users would see this issue while ScikitLearn.jl would not.
The only way I've found so far to tell our users is to use my fork of 0.5.3+1 (instead of 0.5.3+3) https://github.com/tlienart/OpenSpecFun_jll.jl . Of course this is a shit workaround (and won't work on CI) but I couldn't use a compat bound on the package as the version string does not meet the standards (the +1 is throwing Pkg off)
Maybe here a 4th solution to your list would be to add a way to cap the version on such packages so that I can enforce 0.5.3+1 with which the problem does not happen.
I wonder if https://github.com/JuliaPackaging/Yggdrasil/issues/915 is related. It also has to do with libiomp on Mac. Does things work ok for you if you use a system installation of MKL / copy over the libmkl_intel_thread.dylib from the system installation to the artifact one?
Hey @KristofferC thanks for this. The problem is now mostly with Travis (is the travis log linked to above, useful in narrowing things down?), @staticfloat tried to reproduce this locally and couldn't (thanks for trying!). However the fact that it fails on Travis is still bad as some users may have setups that would look like it. We currently have a workaround with an fork of OpenSpecFun_jll which should work for users in the mean time but it's not ideal of course.
I'd be happy to try telling Travis to use this libmkl_intel_thread.dylib but not sure how that would work?
IMO this causes silent bugs. I would much rather favor correctness and go for the second option as the solution right away. Since such few packages use libgomp, it is ok to get them to manually dlopen it.
@tlienart can you point me to the fork of openspecfun_jll? I would like to see if we can remove the libgomp dependency in that package.
Since such few packages use libgomp
I don't think this assesment is very accurate (and I don't think it was even accurate last year):
% find . -name '*.jl' | xargs grep -l CompilerSupportLibraries_jll | xargs grep -lv expand_gfortran
./R/raptor/build_tarballs.jl
./R/rr/build_tarballs.jl
./R/RadeonProRender/build_tarballs.jl
./U/UCX/build_tarballs.jl
./I/IpoptMKL/build_tarballs.jl
./I/IOAPI/build_tarballs.jl
./I/ITSOL_2/build_tarballs.jl
./N/NL2sol/build_tarballs.jl
./N/NetCDFF/build_tarballs.jl
./N/NOMAD/build_tarballs.jl
./N/nlminb/build_tarballs.jl
./N/normaliz/build_tarballs.jl
./G/glmnet/build_tarballs.jl
./G/Gettext/build_tarballs.jl
./G/gb/build_tarballs.jl
./G/Gingko/build_tarballs.jl
./Z/zfp/build_tarballs.jl
./Z/ZITSOL_1/build_tarballs.jl
./Z/ZPares/build_tarballs.jl
./T/tblis/build_tarballs.jl
./T/Torch/build_tarballs.jl
./T/Tesseract/build_tarballs.jl
./T/TMatrix/build_tarballs.jl
./T/Trilinos/build_tarballs.jl
./S/SCALAPACK/build_tarballs.jl
./S/SCIP/build_tarballs.jl
./S/SDPA/build_tarballs.jl
./S/SuperLU_MT/build_tarballs.jl
./S/SoXResampler/build_tarballs.jl
./S/Sundials/Sundials@5/build_tarballs.jl
./S/Sundials/Sundials32@5/build_tarballs.jl
./S/spglib/build_tarballs.jl
./S/SoapyRTLSDR/build_tarballs.jl
./S/StarPU/build_tarballs.jl
./S/SHTOOLS/build_tarballs.jl
./S/SCIP_PaPILO/build_tarballs.jl
./S/scopehal/build_tarballs.jl
./S/SPRAL/build_tarballs.jl
./S/SLATEC/build_tarballs.jl
./S/SuiteSparse/SSGraphBLAS/build_tarballs.jl
./A/Arpack/build_tarballs.jl
./A/AMReX/build_tarballs.jl
./A/ADIOS2/build_tarballs.jl
./A/AzStorage/build_tarballs.jl
./F/FastTransforms/build_tarballs.jl
./F/FLANN/build_tarballs.jl
./F/FGlT/build_tarballs.jl
./F/FMM3D/build_tarballs.jl
./F/finufft/build_tarballs.jl
./O/openPMD_api/build_tarballs.jl
./O/OpenAL/build_tarballs.jl
./O/OpenSpecFun/build_tarballs.jl
./O/OpenBLAS/common.jl
./O/ODEInterface/build_tarballs.jl
./O/OpenMPI/build_tarballs.jl
./O/OpenLSTO/build_tarballs.jl
./H/HOHQMesh/build_tarballs.jl
./H/HHsuite/build_tarballs.jl
./H/HiGHS/build_tarballs.jl
./M/MPICH/build_tarballs.jl
./M/Modflow6/build_tarballs.jl
./M/MPItrampoline/build_tarballs.jl
./M/MMseqs2/build_tarballs.jl
./M/muparser/build_tarballs.jl
./M/msolve/build_tarballs.jl
./M/MUMPS/MUMPS_seq_MKL/build_tarballs.jl
./M/MUMPS/MUMPS_seq@5/build_tarballs.jl
./M/MUMPS/MUMPS@5/build_tarballs.jl
./M/MUMPS/MUMPS_seq@4/build_tarballs.jl
./C/CovidSim/build_tarballs.jl
./C/CUTENSOR/build_tarballs.jl
./C/Coin-OR/BiCePS/build_tarballs.jl
./C/Coin-OR/Cgl/build_tarballs.jl
./C/Coin-OR/SYMPHONY/build_tarballs.jl
./C/Coin-OR/CSDP/build_tarballs.jl
./C/Coin-OR/Ipopt/build_tarballs.jl
./C/Coin-OR/CoinUtils/build_tarballs.jl
./C/Coin-OR/Cbc/build_tarballs.jl
./C/Coin-OR/Osi/build_tarballs.jl
./C/Coin-OR/CHiPPS_BLIS/build_tarballs.jl
./C/Coin-OR/MibS/build_tarballs.jl
./C/Coin-OR/Bonmin/build_tarballs.jl
./C/Coin-OR/SHOT/build_tarballs.jl
./C/Coin-OR/ALPS/build_tarballs.jl
./C/Coin-OR/Clp/build_tarballs.jl
./C/CvxCompress/build_tarballs.jl
./C/Chuffed/build_tarballs.jl
./C/CeresSolver/build_tarballs.jl
./C/CUTEst/build_tarballs.jl
./C/cilantro/build_tarballs.jl
./C/COSMA/build_tarballs.jl
./D/DASKR/build_tarballs.jl
./D/difmap/build_tarballs.jl
./D/Darknet/common.jl
./D/Dierckx/build_tarballs.jl
./V/ViennaRNA/build_tarballs.jl
./V/VMEC/build_tarballs.jl
./Q/qr_mumps/build_tarballs.jl
./Q/Qt/build_tarballs.jl
./Q/QuantReg/build_tarballs.jl
./Q/Qt5Base/build_tarballs.jl
./Q/QCDNUM/build_tarballs.jl
./Q/Qt6Base/build_tarballs.jl
./Q/qwtw/build_tarballs.jl
./Q/QuantumEspresso/build_tarballs.jl
./X/xfoil_light/build_tarballs.jl
./X/XGBoost/build_tarballs.jl
./X/Xyce/build_tarballs.jl
./B/Birch_Standard/build_tarballs.jl
./B/beefalo/build_tarballs.jl
./B/basiclu/build_tarballs.jl
./B/blis/build_tarballs.jl
./K/Kokkos/build_tarballs.jl
./L/libpolymake_julia/build_tarballs.jl
./L/libnabo/build_tarballs.jl
./L/LAPACK/build_tarballs.jl
./L/Libxc/build_tarballs.jl
./L/libgraphicsmagic/build_tarballs.jl
./L/libsharp2/build_tarballs.jl
./L/LibBirch/build_tarballs.jl
./L/LibRaw/build_tarballs.jl
./L/LAMMPS/build_tarballs.jl
./L/libsvm/build_tarballs.jl
./L/LoopTools/build_tarballs.jl
./L/LibAMVW/build_tarballs.jl
./P/polymake/build_tarballs.jl
./P/PCL/build_tarballs.jl
./P/PETSc/build_tarballs.jl
./P/primecount/build_tarballs.jl
./P/PGPLOT/build_tarballs.jl
./W/wannier90/build_tarballs.jl
./W/WaveFD/build_tarballs.jl
./W/Wigxjpf/build_tarballs.jl
I could find at least 133 recipes which reference CompilerSupportLibraries_jll which don't expand the libgfortran version: the main libraries for which we have to pull CSL are libgfortran and libgomp, if it isn't one, it's the other. 133 isn't that few packages and having to manually dlopen libraries isn't something easy to do.
But do they all need to dlopen libomp? It is definitely not a good option, but loading MKL.jl after SpecialFunctions.jl does not just error but silently gives wrong answers non-deterministically.
They probably all dynamically link to libgomp, so yes, they do need it. As I said, it's either libgfortran or libgomp.
I added a note to MKL.jl for now about this: https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/README.md#usage
openspecfun_jll
not sure if this is still relevant: https://github.com/tlienart/OpenSpecFun_jll.jl?organization=tlienart&organization=tlienart