Torch.jl
Torch.jl copied to clipboard
Error upon installing
Using Julia 1.5.3 on a computer with GPU:
julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/home/viralbshah/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcublas.so.10: cannot open shared object file: No such file or directory
Stacktrace:
[1] dlopen(::String, ::UInt32; throw_error::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
[2] dlopen at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)
Perhaps the same as #32?
I am having the same issue both locally and on a cluster. Both have Julia 1.5.3 and CUDA 11.0.
julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcufft.so.10: cannot open shared object file: No such file or directory
Stacktrace:
[1] dlopen(::String, ::UInt32; throw_error::Bool) at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
[2] dlopen at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)
Trying on CUDA 10.1 yields a similar error:
julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
/lib64/libm.so.6: version `GLIBC_2.23' not found (required by /tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libtorch.so)
Stacktrace:
[1] dlopen(::String, ::UInt32; throw_error::Bool) at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
[2] dlopen at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)
What is the versioninfo()
?
Locally
julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-10.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = atom -a
JULIA_NUM_THREADS = 4
On cluster
julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
JULIA_DEPOT_PATH = /tmpdir/maile/.julia
My first errors were produced with the latest version release. On master locally, I get
libcublas.so.10: cannot open shared object file: No such file or directory
On master on the cluster, the errors are the same.
That looks like an issue with the local CUDA setup. We should really just setup lazy artifacts to make these errors go away entirely.
Hit the same issue (I think) on a GPU machine with Julia 1.6.1 and a fresh environment:
julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
(@v1.6) pkg> st
Status `~/.julia/environments/v1.6/Project.toml`
[052768ef] CUDA v3.3.3
[587475ba] Flux v0.12.4
[7073ff75] IJulia v1.23.2
[6a2ea274] Torch v0.1.2
julia> using CUDA; CUDA.versioninfo()
CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.73.1
Libraries:
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+460.73.1
- CUDNN: 8.20.0 (for CUDA 11.3.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)
Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
0: Tesla T4 (sm_75, 14.414 GiB / 14.756 GiB available)
julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/home/jupyter/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcublas.so.10: cannot open shared object file: No such file or directory
Stacktrace:
[1] dlopen(s::String, flags::UInt32; throw_error::Bool)
@ Base.Libc.Libdl ./libdl.jl:114
[2] dlopen (repeats 2 times)
@ ./libdl.jl:114 [inlined]
[3] __init__()
@ Torch_jll ~/.julia/packages/Torch_jll/sFQc0/src/wrappers/x86_64-linux-gnu-cxx11.jl:57
[4] _include_from_serialized(path::String, depmods::Vector{Any})
@ Base ./loading.jl:674
[5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
@ Base ./loading.jl:760
[6] _require(pkg::Base.PkgId)
@ Base ./loading.jl:998
[7] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:914
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:901
[9] include
@ ./Base.jl:386 [inlined]
[10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
@ Base ./loading.jl:1213
[11] top-level scope
@ none:1
[12] eval
@ ./boot.jl:360 [inlined]
[13] eval(x::Expr)
@ Base.MainInclude ./client.jl:446
[14] top-level scope
@ none:1
during initialization of module Torch_jll
in expression starting at /home/jupyter/.julia/packages/Torch/fIKJf/src/Torch.jl:1
ERROR: Failed to precompile Torch [6a2ea274-3061-11ea-0d63-ff850051a295] to /home/jupyter/.julia/compiled/v1.6/Torch/jl_Yw2dNx.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY)
@ Base ./loading.jl:1360
[3] compilecache(pkg::Base.PkgId, path::String)
@ Base ./loading.jl:1306
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1021
[5] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:914
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:901
[7] top-level scope
@ ~/.julia/packages/CUDA/02Kjq/src/initialization.jl:52
is there any recommended workaround?
For this issue, one workaround you could try is to link the cuda library by
ln -s ~/path/to/libcublas.so.10 /home/jupyter/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libcublas.so.10