AMDGPU.jl icon indicating copy to clipboard operation
AMDGPU.jl copied to clipboard

Segfault in libamdhip

Open Keno opened this issue 3 years ago • 6 comments

julia> a_d = ROCArray(a)
32-element ROCVector{Float64}:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
__libc_message at /usr/lib/libc.so.6 (unknown line)
malloc_printerr at /usr/lib/libc.so.6 (unknown line)
_int_free at /usr/lib/libc.so.6 (unknown line)
cfree at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7fb9316ce317)
unknown function (ip: 0x7fb9316cf7e7)
unknown function (ip: 0x7fb93167e30e)
unknown function (ip: 0x7fb93169426d)
unknown function (ip: 0x7fb93157c834)
__pthread_once_slow at /usr/lib/libpthread.so.0 (unknown line)
hipStreamSynchronize at /home/deck/.julia/artifacts/b5a35fe56035e3d95e3203689c38aafec324a861/hip/lib/libamdhip64.so (unknown line)
macro expansion at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/error.jl:149 [inlined]
hipStreamSynchronize at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/libhip.jl:2
wait! at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/sync.jl:20
wait! at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/array.jl:86 [inlined]
copyto! at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/array.jl:182
copyto! at /home/deck/.julia/packages/GPUArrays/VNhDf/src/host/abstractarray.jl:95 [inlined]
copyto_axcheck! at ./abstractarray.jl:1104 [inlined]
Array at ./array.jl:563 [inlined]
Array at ./boot.jl:481 [inlined]
convert at ./array.jl:554 [inlined]
adapt_storage at /home/deck/.julia/packages/GPUArrays/VNhDf/src/host/abstractarray.jl:45 [inlined]
adapt_structure at /home/deck/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
adapt at /home/deck/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
print_array at /home/deck/.julia/packages/GPUArrays/VNhDf/src/host/abstractarray.jl:48 [inlined]
show at ./arrayshow.jl:396
unknown function (ip: 0x7fb9326da581)
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Custom APU 0405
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, znver2)

Keno avatar Mar 28 '22 03:03 Keno

AMD Custom APU 0405

Is this a special/experimental APU? In the past, we've had bugs and segfaults with APUs (including on my own).

Are you using AMDGPU-provided ROCm artifacts, or system libraries?

jpsamaroo avatar Mar 28 '22 14:03 jpsamaroo

Is this a special/experimental APU?

No, this is an AMD Van Gogh APU

Are you using AMDGPU-provided ROCm artifacts, or system libraries?

AMDGPU-provided

Keno avatar Mar 28 '22 14:03 Keno

Can you try disabling artifacts with JULIA_AMDGPU_DISABLE_ARTIFACTS=1 and re-building AMDGPU? Assuming you have a system-provided ROCm available.

jpsamaroo avatar Mar 28 '22 14:03 jpsamaroo

Can you try disabling artifacts with JULIA_AMDGPU_DISABLE_ARTIFACTS=1 and re-building AMDGPU? Assuming you have a system-provided ROCm available.

Segfaults also, similar backtrace:

signal (11): Segmentation fault
in expression starting at none:0
unknown function (ip: 0x7f09e0f1e0fd)
unknown function (ip: 0x7f09e0f1e3b7)
hipStreamSynchronize at /opt/rocm/lib/libamdhip64.so (unknown line)
macro expansion at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/error.jl:149 [inlined]
hipStreamSynchronize at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/libhip.jl:2

Keno avatar Mar 28 '22 23:03 Keno

So, if you want to just hide libamdhip64.so from AMDGPU (just make it .bak or similar), we can load without it. You may also need to do the same for rocBLAS, rocFFT, et. al.

If you actually want full functionality, then building glibc with debug symbols would be very helpful.

jpsamaroo avatar Mar 29 '22 00:03 jpsamaroo

Is it actually in glibc though? Presumably __pthread_once_slow calls back into whatever callback HIP passes it. I tried building HIP with debug symbols, but ran into https://github.com/JuliaPackaging/Yggdrasil/pull/4689#issuecomment-1081262980

Keno avatar Mar 29 '22 00:03 Keno