Torch.jl icon indicating copy to clipboard operation
Torch.jl copied to clipboard

Error upon installing

Open ViralBShah opened this issue 4 years ago • 8 comments

Using Julia 1.5.3 on a computer with GPU:

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/home/viralbshah/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcublas.so.10: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)

ViralBShah avatar Nov 29 '20 17:11 ViralBShah

Perhaps the same as #32?

ViralBShah avatar Nov 29 '20 17:11 ViralBShah

I am having the same issue both locally and on a cluster. Both have Julia 1.5.3 and CUDA 11.0.

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcufft.so.10: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)

Trying on CUDA 10.1 yields a similar error:

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
/lib64/libm.so.6: version `GLIBC_2.23' not found (required by /tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libtorch.so)
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)

k8lion avatar Feb 03 '21 13:02 k8lion

What is the versioninfo()?

DhairyaLGandhi avatar Feb 03 '21 16:02 DhairyaLGandhi

Locally

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = atom  -a
  JULIA_NUM_THREADS = 4

On cluster

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_DEPOT_PATH = /tmpdir/maile/.julia

k8lion avatar Feb 04 '21 09:02 k8lion

My first errors were produced with the latest version release. On master locally, I get

libcublas.so.10: cannot open shared object file: No such file or directory

On master on the cluster, the errors are the same.

k8lion avatar Feb 04 '21 13:02 k8lion

That looks like an issue with the local CUDA setup. We should really just setup lazy artifacts to make these errors go away entirely.

DhairyaLGandhi avatar Feb 04 '21 13:02 DhairyaLGandhi

Hit the same issue (I think) on a GPU machine with Julia 1.6.1 and a fresh environment:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

(@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [052768ef] CUDA v3.3.3
  [587475ba] Flux v0.12.4
  [7073ff75] IJulia v1.23.2
  [6a2ea274] Torch v0.1.2

julia> using CUDA; CUDA.versioninfo()
CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.73.1

Libraries: 
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+460.73.1
- CUDNN: 8.20.0 (for CUDA 11.3.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: Tesla T4 (sm_75, 14.414 GiB / 14.756 GiB available)

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/home/jupyter/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcublas.so.10: cannot open shared object file: No such file or directory
Stacktrace:
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl ./libdl.jl:114
  [2] dlopen (repeats 2 times)
    @ ./libdl.jl:114 [inlined]
  [3] __init__()
    @ Torch_jll ~/.julia/packages/Torch_jll/sFQc0/src/wrappers/x86_64-linux-gnu-cxx11.jl:57
  [4] _include_from_serialized(path::String, depmods::Vector{Any})
    @ Base ./loading.jl:674
  [5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
    @ Base ./loading.jl:760
  [6] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:998
  [7] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:914
  [8] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:901
  [9] include
    @ ./Base.jl:386 [inlined]
 [10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1213
 [11] top-level scope
    @ none:1
 [12] eval
    @ ./boot.jl:360 [inlined]
 [13] eval(x::Expr)
    @ Base.MainInclude ./client.jl:446
 [14] top-level scope
    @ none:1
during initialization of module Torch_jll
in expression starting at /home/jupyter/.julia/packages/Torch/fIKJf/src/Torch.jl:1
ERROR: Failed to precompile Torch [6a2ea274-3061-11ea-0d63-ff850051a295] to /home/jupyter/.julia/compiled/v1.6/Torch/jl_Yw2dNx.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY)
   @ Base ./loading.jl:1360
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1306
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1021
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:914
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:901
 [7] top-level scope
   @ ~/.julia/packages/CUDA/02Kjq/src/initialization.jl:52

is there any recommended workaround?

PerezHz avatar Jul 12 '21 02:07 PerezHz

For this issue, one workaround you could try is to link the cuda library by ln -s ~/path/to/libcublas.so.10 /home/jupyter/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libcublas.so.10

LeeLizuoLiu avatar Sep 18 '21 04:09 LeeLizuoLiu