julia
julia copied to clipboard
virtualgl/vglrun + 1.9rc2 => fatal error could not load library libopenblas64_.so
When I try to run julia with vglrun
(via virtualgl on a headless server, virtual displays with NoMachine) I get a crash only in Julia1.9rc1/2 - but not in Julia 1.8.3. This is on an ubuntu 22 installation. Without virtualgl everythin works as intended.
It throws a: could not load library "libopenblas64_.so"
- I dont know how to diagnose this further.
vglrun ./julia
fatal: error thrown and no exception handler available.
InitError(mod=:OpenBLAS_jll, error=ErrorException("could not load library "libopenblas64_.so"
libopenblas64_.so: cannot open shared object file: No such file or directory"))
ijl_errorf at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/rtutils.c:77
ijl_load_dynamic_library at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/dlload.c:369
#dlopen#3 at ./libdl.jl:117
dlopen at ./libdl.jl:116 [inlined]
dlopen at ./libdl.jl:116 [inlined]
__init__ at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/OpenBLAS_jll/src/OpenBLAS_jll.jl:53
jfptr___init___57050.clone_1 at /home/ehinger/Downloads/julia-1.9.0-rc2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_module_run_initializer at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:75
_finish_julia_init at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/init.c:850
julia_init at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/init.c:799
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:711
main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
unknown function (ip: 0x7f73e2401d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
- The output of
versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 7452 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
Threads: 1 on 128 virtual cores
Environment:
JULIA_DEPOT_PATH = ~/.julia
LD_PRELOAD =
- How you installed Julia
wget + tar
- A minimal working example (MWE), also known as a minimum reproducible example
~/julia-1.9.0-rc2/bin ❯ /vglrun ./julia
Try running with the LD_DEBUG environment variable set and see if that helps
12180 1145047:
312181 1145047: file=libopenblas64_.so [0]; dynamically loaded by /lib/libvglfaker.so [0]
312182 1145047: find library=libopenblas64_.so [0]; searching
312183 1145047: search cache=/etc/ld.so.cache
312184 1145047: search path=/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/lib:/usr/lib (system search path)
312185 1145047: trying file=/lib/x86_64-linux-gnu/libopenblas64_.so
312186 1145047: trying file=/usr/lib/x86_64-linux-gnu/libopenblas64_.so
312187 1145047: trying file=/lib/libopenblas64_.so
312188 1145047: trying file=/usr/lib/libopenblas64_.so
312189 1145047:
312190 fatal: error thrown and no exception handler available.
312191 InitError(mod=:OpenBLAS_jll, error=ErrorException("could not load library "libopenblas64_.so"
312192 libopenblas64_.so: cannot open shared object file: No such file or directory"))
this is what I get immediately before
whereas this is what happens in julia 1.8.3
278925 1148742: file=/opt/julia-1.8.3/bin/../lib/julia/libopenblas64_.so [0]; dynamically loaded by /lib/libvglfaker.so [0]
278926 1148742: file=/opt/julia-1.8.3/bin/../lib/julia/libopenblas64_.so [0]; generating link map
278927 1148742: dynamic: 0x00007f33c795da80 base: 0x00007f33c5b85000 size: 0x0000000001e7a2a8
278928 1148742: entry: 0x00007f33c5cb5000 phdr: 0x00007f33c5b85040 phnum: 11
It looks like libvglfaker.so may be dynamically replacing dlopen
with a broken version. I am not sure we can do much about that. You might be able to get something mostly working with setting LD_LOAD_PATH.
ok, I added the lib/julia folder to LD_LIBRARY_PATH
(not LD_LOAD_PATH, probably mixup with JULIA_LOAD_PATH?) which fixed this and I can start julia1.9 :)
But I still wonder why this is necessary in julia1.9 but not julia 1.8.3
I experienced the same issues with pytorch when using vglrun. Everything worked fine if i didn't use vglrun. I was running on a conda environment and saw similar issue during debugging. I got something like this initially:
return torch.linalg.cholesky_ex(value).info.eq(0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Error in dlopen: libtorch_cuda_linalg.so: cannot open shared object file: No such file or directory
I fixed it like this:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/my/virtual/env/lib/python3.11/site-packages/torch/lib/
I couldn't find any other way around this.
Please reopen if still an issue but with a recent Julia version .
still an issue with 1.10 - if of interest, I can try with 1.11rc next week or so