PyCall.jl icon indicating copy to clipboard operation
PyCall.jl copied to clipboard

Segfault with Julia MKL build (library conflict)

Open EthanAnderes opened this issue 6 years ago • 18 comments

With a recent upgrade of anaconda I'm getting seg faults with PyCall (and PyPlot). Reading the other recent issues it appears some are having the same problem but my output is a bit different and the workarounds for those don't seem to be helping any. Hope I'm not adding noise to something that is already known.

If I run the code from 6423 I just get a straight seg fault.

julia> using PyCall

julia> pyimport("numpy.linalg")["inv"]([2 1; 1 2])
Segmentation fault: 11

calling directly from python works fine.

Python 3.6.2 |Anaconda, Inc.| (default, Sep 21 2017, 18:29:43)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.linalg.inv(np.matrix('2 1; 1 2'))
matrix([[ 0.66666667, -0.33333333],
        [-0.33333333,  0.66666667]])

I've tried to work around it with what appears to work for some people. However I'm still getting seg faults. In particular, I tried the following without success.

julia> Libdl.dlopen("/Users/ethananderes/Software/anaconda3/lib/libiomp5.dylib")
julia> Libdl.dlopen("/Users/ethananderes/Software/anaconda3/lib/libmkl_intel_thread.dylib")
install_name_tool -change @rpath/libiomp5.dylib @loader_path/libiomp5.dylib /Users/ethananderes/Software/anaconda3/lib/libmkl_intel_thread.dylib

install_name_tool -change @rpath/libiomp5.dylib @loader_path/libiomp5.dylib /Users/ethananderes/Software/anaconda3/lib/libiomp5.dylib

Any ideas what is going on here?

Some possibly relevant info:

julia> using PyCall

julia> PyCall.libpython
"/Users/ethananderes/Software/anaconda3/lib/libpython3.6m"

julia> versioninfo()
Julia Version 0.6.1-pre.92
Commit 389b23cf6e* (2017-10-07 01:18 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.0.0)
  CPU: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

EthanAnderes avatar Oct 11 '17 23:10 EthanAnderes

... reference for anyone having a similar problem. I finally found a workaround that might point to where the problem is.

If I downgrade just conda's mkl (using conda install mkl=2017.0.4) then PyCall and PyPlot no longer segfault.

Remark 1: I was only seeing the segfault with Julia compiled with MKL for BLAS and FFT (and using PyCall set to my local anaconda3 install). The downgrade of conda's mkl fixes things so it makes me wonder if there is some mkl interaction with the Julia and conda (??)

Remark 2: running conda install mkl=2017.0.4 downgrades a few other packages as well. In particular, mkl-service, numpy, scikit-learn and scipy. Just a heads up to others.

Remark 3: The only reason I took a stab at downgrading conda's mkl is that I happened to read this... just for reference to those who know more about this stuff.

EthanAnderes avatar Dec 09 '17 00:12 EthanAnderes

Probably Julia and NumPy are linked to incompatible versions of MKL .... there's not much to do about this either than (a) make sure they use the same MKL versions or (b) don't use MKL in one or both of them (e.g. switch one to use OpenBLAS).

stevengj avatar Jan 17 '18 18:01 stevengj

Yeah, although I'm not sure I understand why NumPy is reaching out to the MKL library Julia is linked to, e.g. when I compile Julia with OpenBLAS, NumPy has no problem calling it's own MKL.

Anyway, I've got a working solution now and hopefully the MKL incompatibility will work itself out as I NumPy updates it's MKL ... so I'm fine with closing this. Thanks!

EthanAnderes avatar Jan 17 '18 18:01 EthanAnderes

When Julia is compiled with OpenBLAS, all of the BLAS symbols have a special suffix so that they don't conflict with other BLAS libraries (JuliaLang/julia#8734). We can't (easily) do this with MKL because we don't compile MKL ourselves.

stevengj avatar Jan 17 '18 22:01 stevengj

See also #65.

stevengj avatar Jan 17 '18 22:01 stevengj

I'm getting the same sort of segfault problem in mkl as soon as I try to use PyPlot, julia> include("calsim.jl")

signal (11): Segmentation fault while loading /data/projects/Maestro/calsim.jl, in expression starting on line 59 mkl_blas_avx2_dgemm_kernel_nocopy_NN_b0 at /home/matt/programs/juliapro/JuliaPro-0.6.2.2/Julia/bin/../lib/libmkl_avx2.so (unknown line) Allocations: 14076406 (Pool: 14074321; Big: 2085); GC: 32

line 59 is the first call to figure() within PyPlot. Unfortunately the downgrade of mkl using conda did not work for me. I noticed the environment shipped with julia 0.6.2.2 didn't have mkl installed for anaconda either, so I tried to install the recommended mkl to see if that would fix it.

As a result I have a broken PyCall. I'm going to try the non MKL version since the MKL version of juliapro-0.6.2.2 is not working for me.

mattcbro avatar May 26 '18 04:05 mattcbro

@stevengj Has it been verified that linking NumPy and Julia to the same MKL libraries works smoothly? Because I just tried exactly this and still get Seg faults.

(I built Julia and linked it against my manually installed MKL. I built numpy and linked it against the same MKL installation (specified the path in site.cfg and verified it afterwards in python shell).)

carstenbauer avatar Jul 16 '18 20:07 carstenbauer

@crstnbr, doesn't Julia link the ILP64 MKL interface by default, whereas numpy uses the LP64 interface (numpy/numpy#5906)?

I think you probably need to compile Julia with USE_BLAS64=0 in order to use the LP64 MKL if you want it to be compatible with Numpy.

stevengj avatar Jul 17 '18 15:07 stevengj

@crstnbr I had your exact problem - same MKL lib used in numpy as was used to compile julia, but segfault with:

using PyCall
py"""
import numpy as np

A1 = np.random.random((3,3))
A2 = np.random.random((3,3))
np.matmul(A1,A2)
"""

even though the same code worked in Python (i.e. the version of Python that build PyCall found as set via ENV["PYTHON"])

It now seems to be fixed by adding USE_BLAS64=0 to my to my Make.user file (as per the comment immediately above) and recompiling julia, i.e. just a make -j8 julia. I didn't even have to make clean.

Thanks @stevengj !

JobJob avatar Dec 18 '18 14:12 JobJob

Oh and the other thing I had to do earlier was add the directory that has libiomp5.so on my LD_LIBRARY_PATH, the path was different to the mkl path, with something like

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/compilers_and_libraries_2018.2.199/linux/compiler/lib/intel64_lin

in my ~/.profile

JobJob avatar Dec 19 '18 02:12 JobJob

Can the ABI incompatibility be detected before segfault? Does Julia have runtime access for the build option USE_BLAS64? What about for Numpy? It would be nice if we can print an informative error.

I guess we can at least run Julia subprocess at build time and run Numpy operation that is known to segfault for incompatible ABI. But it's not a great option as Numpy can be updated at any moment. Also, numpy in virtualenv #578 cannot be supported this way.

tkf avatar Dec 19 '18 06:12 tkf

@JobJob: Thanks for your post! I can confirm that USE_BLAS64 = 0 solved this issue! It even worked (as far as I checked) with standard numpy binaries.

However, I can't find much information on what USE_BLAS64 = 0 actually does and what side effects, apart from solving this issue, it has. Does anyone know more or can point me somewhere?

carstenbauer avatar Dec 19 '18 09:12 carstenbauer

@crstnbr, USE_BLAS64=0 means that it assumes that the BLAS (matrix-multiplication etc.) library is compiled to use 32-bit integers for its interfaces. This means that you can't do linear-algebra operations on matrices or vectors with more than 2³¹–1 (≈2×10⁹) elements, even on a 64-bit machine.

(The reason for this mess is that the BLAS interface was defined to use integer sizes in Fortran in the days when everyone thought that the default integer size would match the address size on the machine. In the upgrade to 64-bit architectures, however, integer stayed 32 bits, and so did the default BLAS interface. Most BLAS libraries give the option of compiling with 64-bit integers instead, but since the symbols are not renamed the 64-bit and 32-bit libraries conflict. With OpenBLAS, we solved this problem by renaming the symbols, but this was not an option with MKL.)

stevengj avatar Dec 19 '18 13:12 stevengj

And Numpy uses the 32-bit integers (LP64), whereas Julia without USE_BLAS64=0 uses 64-bit integers (ILP64) right?

I saw you linked this issue somewhere https://github.com/numpy/numpy/issues/5906 - oh just above :D, got a bit lost in the nest of related issues I waded through while trying to solve this.

JobJob avatar Dec 19 '18 13:12 JobJob

@stevengj , Regarding your saying:

@crstnbr, USE_BLAS64=0 means that it assumes that the BLAS (matrix-multiplication etc.) library is compiled to use 32-bit integers for its interfaces. This means that you can't do linear-algebra operations on matrices or vectors with more than 2³¹–1 (≈2×10⁹) elements, even on a 64-bit machine.

What about Sparse Matrix? In that case, how do we count the elements? As the number of non zero elements or the size of the Matrix?

RoyiAvital avatar Apr 22 '20 21:04 RoyiAvital

Sparse matrices aren't handled by LAPACK/BLAS but SuiteSparse, so they shouldn't be effected by this. In the case of a regular matrix, it's the size of the matrix that matters.

(Please correct me if I'm wrong.)

carstenbauer avatar Apr 23 '20 06:04 carstenbauer

Hi guys. I also face this problem. Can anyone tell me how to set USE_BLAS64 = 0? I just have very limited programming knowledge and I am new to julia. Thanks.

ghost avatar Aug 28 '21 02:08 ghost

Easy solution should be to use Julia 1.7 (beta) and MKL.jl.

carstenbauer avatar Aug 28 '21 12:08 carstenbauer