AMDGPU.jl
AMDGPU.jl copied to clipboard
Segfault in libamdhip
julia> a_d = ROCArray(a)
32-element ROCVector{Float64}:
free(): invalid pointer
signal (6): Aborted
in expression starting at none:0
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
__libc_message at /usr/lib/libc.so.6 (unknown line)
malloc_printerr at /usr/lib/libc.so.6 (unknown line)
_int_free at /usr/lib/libc.so.6 (unknown line)
cfree at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7fb9316ce317)
unknown function (ip: 0x7fb9316cf7e7)
unknown function (ip: 0x7fb93167e30e)
unknown function (ip: 0x7fb93169426d)
unknown function (ip: 0x7fb93157c834)
__pthread_once_slow at /usr/lib/libpthread.so.0 (unknown line)
hipStreamSynchronize at /home/deck/.julia/artifacts/b5a35fe56035e3d95e3203689c38aafec324a861/hip/lib/libamdhip64.so (unknown line)
macro expansion at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/error.jl:149 [inlined]
hipStreamSynchronize at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/libhip.jl:2
wait! at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/sync.jl:20
wait! at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/array.jl:86 [inlined]
copyto! at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/array.jl:182
copyto! at /home/deck/.julia/packages/GPUArrays/VNhDf/src/host/abstractarray.jl:95 [inlined]
copyto_axcheck! at ./abstractarray.jl:1104 [inlined]
Array at ./array.jl:563 [inlined]
Array at ./boot.jl:481 [inlined]
convert at ./array.jl:554 [inlined]
adapt_storage at /home/deck/.julia/packages/GPUArrays/VNhDf/src/host/abstractarray.jl:45 [inlined]
adapt_structure at /home/deck/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
adapt at /home/deck/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
print_array at /home/deck/.julia/packages/GPUArrays/VNhDf/src/host/abstractarray.jl:48 [inlined]
show at ./arrayshow.jl:396
unknown function (ip: 0x7fb9326da581)
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Custom APU 0405
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, znver2)
AMD Custom APU 0405
Is this a special/experimental APU? In the past, we've had bugs and segfaults with APUs (including on my own).
Are you using AMDGPU-provided ROCm artifacts, or system libraries?
Is this a special/experimental APU?
No, this is an AMD Van Gogh APU
Are you using AMDGPU-provided ROCm artifacts, or system libraries?
AMDGPU-provided
Can you try disabling artifacts with JULIA_AMDGPU_DISABLE_ARTIFACTS=1 and re-building AMDGPU? Assuming you have a system-provided ROCm available.
Can you try disabling artifacts with
JULIA_AMDGPU_DISABLE_ARTIFACTS=1and re-building AMDGPU? Assuming you have a system-provided ROCm available.
Segfaults also, similar backtrace:
signal (11): Segmentation fault
in expression starting at none:0
unknown function (ip: 0x7f09e0f1e0fd)
unknown function (ip: 0x7f09e0f1e3b7)
hipStreamSynchronize at /opt/rocm/lib/libamdhip64.so (unknown line)
macro expansion at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/error.jl:149 [inlined]
hipStreamSynchronize at /home/deck/.julia/packages/AMDGPU/PtNLZ/src/hip/libhip.jl:2
So, if you want to just hide libamdhip64.so from AMDGPU (just make it .bak or similar), we can load without it. You may also need to do the same for rocBLAS, rocFFT, et. al.
If you actually want full functionality, then building glibc with debug symbols would be very helpful.
Is it actually in glibc though? Presumably __pthread_once_slow calls back into whatever callback HIP passes it. I tried building HIP with debug symbols, but ran into https://github.com/JuliaPackaging/Yggdrasil/pull/4689#issuecomment-1081262980