ROCclr segfault when running Julia with threads
Nothing works when using Julia in threaded mode:
[leios@noema Fable.jl]$ julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.3 (2025-01-21)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using AMDGPU
julia> AMDGPU.zeros(10)
10-element ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
julia>
[leios@noema Fable.jl]$ julia -t 2
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.3 (2025-01-21)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using AMDGPU
julia> AMDGPU.zeros(10)
10-element ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
julia: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
[7772] signal 6 (-6): Aborted
in expression starting at none:0
unknown function (ip: 0x7087c8970624)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7087c88fe4ea)
unknown function (ip: 0x70874990c108)
unknown function (ip: 0x708749918dc7)
unknown function (ip: 0x708749706285)
macro expansion at /home/leios/.julia/packages/GPUToolbox/cZlg7/src/ccalls.jl:143 [inlined]
macro expansion at /home/leios/.julia/packages/AMDGPU/STpZC/src/utils.jl:122 [inlined]
hipGetDeviceCount at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/libhip.jl:42 [inlined]
ndevices at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/device.jl:103
TaskLocalState at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:11 [inlined]
TaskLocalState at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:11
TaskLocalState at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:11 [inlined]
#25 at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:27 [inlined]
get! at ./iddict.jl:171
task_local_state! at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:26
prepare_state at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:193 [inlined]
hipStreamQuery at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/libhip.jl:113 [inlined]
#11 at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/stream.jl:114
unknown function (ip: 0x7087bc5f29ff)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 24176043 (Pool: 24175407; Big: 636); GC: 17
Aborted (core dumped)
https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/rocclr/os/os_posix.cpp#L301-L323
What is versioninfo() and AMDGPU.versioninfo()?
Right. That segfaults even when I am using Julia single threaded.
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
julia: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1152: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
[10123] signal 6 (-6): Aborted
in expression starting at REPL[2]:1
unknown function (ip: 0x728308c69624)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x728308bf74ea)
unknown function (ip: 0x728290250954)
unknown function (ip: 0x727ff5b6b99c)
unknown function (ip: 0x728308e1e1d6)
unknown function (ip: 0x728308e1e2ac)
_dl_catch_exception at /lib64/ld-linux-x86-64.so.2 (unknown line)
unknown function (ip: 0x728308e24e78)
_dl_catch_exception at /lib64/ld-linux-x86-64.so.2 (unknown line)
unknown function (ip: 0x728308e25283)
unknown function (ip: 0x728308c639d3)
_dl_catch_exception at /lib64/ld-linux-x86-64.so.2 (unknown line)
unknown function (ip: 0x728308e1b558)
unknown function (ip: 0x728308c634b2)
dlopen at /usr/lib/libc.so.6 (unknown line)
ijl_load_dynamic_library at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/dlload.c:365
jl_get_library_ at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/runtime_ccall.cpp:45 [inlined]
jl_get_library_ at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/runtime_ccall.cpp:29
ijl_lazy_load_and_lookup at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/runtime_ccall.cpp:73
macro expansion at /home/leios/.julia/packages/AMDGPU/STpZC/src/utils.jl:122 [inlined]
miopenGetVersion at /home/leios/.julia/packages/AMDGPU/STpZC/src/dnn/libMIOpen.jl:29
version at /home/leios/.julia/packages/AMDGPU/STpZC/src/dnn/MIOpen.jl:62 [inlined]
_ver at /home/leios/.julia/packages/AMDGPU/STpZC/src/utils.jl:5 [inlined]
versioninfo at /home/leios/.julia/packages/AMDGPU/STpZC/src/utils.jl:6
unknown function (ip: 0x7283019193af)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10097.1 at /home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/REPL/u0gqU_XvZAg.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14693.1 at /home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/REPL/u0gqU_XvZAg.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73609.1 at /home/leios/builds/julia-1.11.3/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x728308bf9487)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 6746836 (Pool: 6746608; Big: 228); GC: 7
Aborted (core dumped)
rocm info:
[leios@noema Fable.jl]$ rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 7700X 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 7700X 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5573
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 31989652(0x1e81f94) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 31989652(0x1e81f94) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 31989652(0x1e81f94) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6700 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 3072(0xc00) KB
L3: 98304(0x18000) KB
Chip ID: 29663(0x73df)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2725
BDFID: 768
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 122
SDMA engine uCode:: 80
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1031
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 3
*******
Name: gfx1036
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 256(0x100) KB
Chip ID: 5710(0x164e)
ASIC Revision: 1(0x1)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2200
BDFID: 4608
Internal Node ID: 2
Compute Unit: 2
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 21
SDMA engine uCode:: 9
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 15994824(0xf40fc8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 15994824(0xf40fc8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1036
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Right. That segfaults even when I am using Julia single threaded.
Yeah that's #767 :/
Ah, oops. Duplicate then
What is:
using AMDGPU
import Libdl
foreach(println, Libdl.dllist())
Ah, oops. Duplicate then
No on my Archlinux system, versioninfo crashes, but other things work.
julia> using AMDGPU
julia> import Libdl
julia> foreach(println, Libdl.dllist())
linux-vdso.so.1
/usr/lib/libdl.so.2
/usr/lib/libpthread.so.0
/usr/lib/libc.so.6
/home/leios/builds/julia-1.11.3/bin/../lib/libjulia.so.1.11
/lib64/ld-linux-x86-64.so.2
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libgcc_s.so.1
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libopenlibm.so
/usr/lib/libstdc++.so.6
/usr/lib/libm.so.6
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libjulia-internal.so.1.11
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libunwind.so.8
/usr/lib/librt.so.1
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libz.so.1
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libatomic.so.1
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libjulia-codegen.so.1.11
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libLLVM-16jl.so
/home/leios/builds/julia-1.11.3/lib/julia/sys.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libpcre2-8.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libgmp.so.10
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libmpfr.so.6
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libgfortran.so.5
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libquadmath.so.0
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libopenblas64_.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libblastrampoline.so.5
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Base64/D7K0n_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Markdown/AREjX_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/InteractiveUtils/0TrXF_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/StyledStrings/UcVoM_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Unicode/E4Hzs_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/REPL/u0gqU_XvZAg.so
/home/leios/.julia/compiled/v1.11/Adapt/rUIgN_2wbLs.so
/home/leios/.julia/compiled/v1.11/CEnum/0gyUJ_4Hzk0.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Printf/3FQLY_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Dates/p8See_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/TOML/mjrwE_XvZAg.so
/home/leios/.julia/compiled/v1.11/Preferences/pWSk8_4Hzk0.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/NetworkOptions/J8H6s_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/MbedTLS_jll/u5NEn_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libmbedcrypto.so.7
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libmbedtls.so.14
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libmbedx509.so.1
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LibSSH2_jll/K6mup_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libssh2.so.1
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LibGit2_jll/nfCpg_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libgit2.so.1.7
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LibGit2/xrYJZ_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/ArgTools/aGHFV_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/nghttp2_jll/KTGSA_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libnghttp2.so.14
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LibCURL_jll/9JWaY_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libcurl.so.4
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/MozillaCACerts_jll/XKIUi_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LibCURL/ht49g_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Downloads/eiA4B_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Tar/G9ZYP_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/p7zip_jll/dfuGM_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/UUIDs/SIw1t_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Logging/PWFjL_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Pkg/tUTdb_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LazyArtifacts/MRP8l_XvZAg.so
/home/leios/.julia/compiled/v1.11/JLLWrappers/7Zgw7_4Hzk0.so
/home/leios/.julia/compiled/v1.11/LLVMExtra_jll/R9OeX_rURdT.so
/home/leios/.julia/compiled/v1.11/LLVM/e8NBy_rURdT.so
/home/leios/.julia/compiled/v1.11/LibTracyClient_jll/mti1A_rURdT.so
/home/leios/.julia/artifacts/8a696873fc2d7c6d28ccd099ccaaa8960691a0a0/lib/libTracyClient.so
/home/leios/.julia/compiled/v1.11/ExprTools/eM8wu_4Hzk0.so
/home/leios/.julia/compiled/v1.11/Tracy/QvZG9_rURdT.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Serialization/zGad9_XvZAg.so
/home/leios/.julia/compiled/v1.11/Scratch/ICI1U_4Hzk0.so
/home/leios/.julia/compiled/v1.11/PrecompileTools/AQ9Mk_4Hzk0.so
/home/leios/.julia/compiled/v1.11/GPUCompiler/yPwef_rURdT.so
/home/leios/.julia/compiled/v1.11/UnsafeAtomics/OuhNJ_4Hzk0.so
/home/leios/.julia/compiled/v1.11/Atomix/3LdQ4_ZduEO.so
/home/leios/.julia/compiled/v1.11/MacroTools/38lnR_NwWdq.so
/home/leios/.julia/compiled/v1.11/StaticArraysCore/Tzw28_4Hzk0.so
/home/leios/.julia/compiled/v1.11/StaticArrays/yY9vm_2wbLs.so
/home/leios/.julia/compiled/v1.11/AdaptStaticArraysExt/9bCdf_2wbLs.so
/home/leios/.julia/compiled/v1.11/KernelAbstractions/aywHT_NwWdq.so
/home/leios/.julia/compiled/v1.11/LinearAlgebraExt/1TyTB_NwWdq.so
/home/leios/.julia/compiled/v1.11/UnsafeAtomicsLLVM/yk2PZ_rURdT.so
/home/leios/.julia/compiled/v1.11/Reexport/bTpYr_4Hzk0.so
/home/leios/.julia/compiled/v1.11/GPUArraysCore/qiYUe_PQ5ag.so
/home/leios/.julia/compiled/v1.11/Statistics/ERcPL_4Hzk0.so
/home/leios/.julia/compiled/v1.11/StaticArraysStatisticsExt/EfhbW_2wbLs.so
/home/leios/.julia/compiled/v1.11/GPUArrays/v5u0T_rURdT.so
/home/leios/.julia/compiled/v1.11/ArgCheck/P66Js_rURdT.so
/home/leios/.julia/compiled/v1.11/ManualMemory/rywzg_4Hzk0.so
/home/leios/.julia/compiled/v1.11/ThreadingUtilities/FwPmW_rURdT.so
/home/leios/.julia/compiled/v1.11/ArrayInterface/7bROb_rURdT.so
/home/leios/.julia/compiled/v1.11/IfElse/YSZB7_4Hzk0.so
/home/leios/.julia/compiled/v1.11/CommonWorldInvalidations/rQYNM_4Hzk0.so
/home/leios/.julia/compiled/v1.11/Static/4nGFz_2wbLs.so
/home/leios/.julia/compiled/v1.11/Compat/GSFWK_4Hzk0.so
/home/leios/.julia/compiled/v1.11/CompatLinearAlgebraExt/Zxpzq_4Hzk0.so
/home/leios/.julia/compiled/v1.11/StaticArrayInterface/1FInX_rURdT.so
/home/leios/.julia/compiled/v1.11/SIMDTypes/NiIYy_4Hzk0.so
/home/leios/.julia/compiled/v1.11/LayoutPointers/SicMc_rURdT.so
/home/leios/.julia/compiled/v1.11/CloseOpenIntervals/eAH4s_rURdT.so
/home/leios/.julia/compiled/v1.11/StrideArraysCore/kWbGj_rURdT.so
/home/leios/.julia/compiled/v1.11/BitTwiddlingConvenienceFunctions/fzQ1O_2wbLs.so
/home/leios/.julia/compiled/v1.11/CpuId/vMZBF_4Hzk0.so
/home/leios/.julia/compiled/v1.11/CPUSummary/3IE2Z_2wbLs.so
/home/leios/.julia/compiled/v1.11/PolyesterWeave/XwY71_rURdT.so
/home/leios/.julia/compiled/v1.11/Polyester/V16F5_rURdT.so
/home/leios/.julia/compiled/v1.11/ArrayInterfaceGPUArraysCoreExt/sVb8r_rURdT.so
/home/leios/.julia/compiled/v1.11/ArrayInterfaceStaticArraysCoreExt/gqwrP_rURdT.so
/home/leios/.julia/compiled/v1.11/StaticArrayInterfaceStaticArraysExt/Nww1z_rURdT.so
/home/leios/.julia/compiled/v1.11/StableTasks/OT4TS_rURdT.so
/home/leios/.julia/compiled/v1.11/ChunkSplitters/DaaT8_rURdT.so
/home/leios/.julia/compiled/v1.11/TaskLocalValues/oQVnM_4Hzk0.so
/home/leios/.julia/compiled/v1.11/ScopedValues/fN9Bp_4Hzk0.so
/home/leios/.julia/compiled/v1.11/ConstructionBase/sBbW6_4Hzk0.so
/home/leios/.julia/compiled/v1.11/ConstructionBaseLinearAlgebraExt/mc2IG_4Hzk0.so
/home/leios/.julia/compiled/v1.11/InitialValues/djxcV_4Hzk0.so
/home/leios/.julia/compiled/v1.11/InverseFunctions/PkVmn_4Hzk0.so
/home/leios/.julia/compiled/v1.11/CompositionsBase/KqDTx_4Hzk0.so
/home/leios/.julia/compiled/v1.11/CompositionsBaseInverseFunctionsExt/WTKja_4Hzk0.so
/home/leios/.julia/compiled/v1.11/InverseFunctionsDatesExt/gjNlb_4Hzk0.so
/home/leios/.julia/compiled/v1.11/Accessors/XelUh_rURdT.so
/home/leios/.julia/compiled/v1.11/LinearAlgebraExt/Pm3AL_rURdT.so
/home/leios/.julia/compiled/v1.11/BangBang/Ovsha_rURdT.so
/home/leios/.julia/compiled/v1.11/OhMyThreads/2oy0C_rURdT.so
/home/leios/.julia/compiled/v1.11/ConstructionBaseStaticArraysExt/MmdaU_2wbLs.so
/home/leios/.julia/compiled/v1.11/StaticArraysExt/sIz4V_rURdT.so
/home/leios/.julia/compiled/v1.11/BangBangStaticArraysExt/I8ZlX_rURdT.so
/home/leios/.julia/compiled/v1.11/MarkdownExt/xNmCG_rURdT.so
/home/leios/.julia/compiled/v1.11/AcceleratedKernels/M6fRl_rURdT.so
/home/leios/.julia/compiled/v1.11/DataValueInterfaces/9Lpkp_4Hzk0.so
/home/leios/.julia/compiled/v1.11/DataAPI/3a8mN_4Hzk0.so
/home/leios/.julia/compiled/v1.11/IteratorInterfaceExtensions/N0h8q_4Hzk0.so
/home/leios/.julia/compiled/v1.11/TableTraits/I6SaN_4Hzk0.so
/home/leios/.julia/compiled/v1.11/OrderedCollections/LtT3J_Hzd3d.so
/home/leios/.julia/compiled/v1.11/Tables/Z804B_Hzd3d.so
/home/leios/.julia/compiled/v1.11/StringManipulation/4nJQd_rURdT.so
/home/leios/.julia/compiled/v1.11/Crayons/TXPcU_4Hzk0.so
/home/leios/.julia/compiled/v1.11/LaTeXStrings/H4HGh_4Hzk0.so
/home/leios/.julia/compiled/v1.11/PrettyTables/kRdcL_rURdT.so
/home/leios/.julia/compiled/v1.11/BangBangTablesExt/h92XF_rURdT.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/LLD_jll/ZHBMJ_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/Zlib_jll/xjq3Q_XvZAg.so
/home/leios/.julia/compiled/v1.11/ROCmDeviceLibs_jll/JXG1e_4Hzk0.so
/home/leios/.julia/compiled/v1.11/GPUToolbox/VNkSP_rURdT.so
/home/leios/.julia/compiled/v1.11/IrrationalConstants/ukdUG_4Hzk0.so
/home/leios/.julia/compiled/v1.11/DocStringExtensions/KRdZs_2wbLs.so
/home/leios/.julia/compiled/v1.11/LogExpFunctions/cmCYR_2wbLs.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/OpenLibm_jll/ToVO1_XvZAg.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/CompilerSupportLibraries_jll/iCwSB_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libgomp.so.1
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libssp.so.0
/home/leios/.julia/compiled/v1.11/OpenSpecFun_jll/TDl1L_4Hzk0.so
/home/leios/.julia/artifacts/09b351e89a85e07e957194a647765403d4ee1bcb/lib/libopenspecfun.so
/home/leios/.julia/compiled/v1.11/SpecialFunctions/78gOt_rURdT.so
/home/leios/.julia/compiled/v1.11/LogExpFunctionsInverseFunctionsExt/IXKft_PQ5ag.so
/home/leios/.julia/compiled/v1.11/RandomNumbers/pgCpR_4Hzk0.so
/home/leios/.julia/compiled/v1.11/Random123/1imiM_rURdT.so
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/SuiteSparse_jll/ME9At_XvZAg.so
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libamd.so.3
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libsuitesparseconfig.so.7
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libbtf.so.2
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libcamd.so.3
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libccolamd.so.3
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libcholmod.so.5
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libcolamd.so.3
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libklu.so.2
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libldl.so.3
/home/leios/builds/julia-1.11.3/bin/../lib/julia/librbio.so.4
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libspqr.so.4
/home/leios/builds/julia-1.11.3/bin/../lib/julia/libumfpack.so.6
/home/leios/builds/julia-1.11.3/share/julia/compiled/v1.11/SparseArrays/P9ieR_XvZAg.so
/home/leios/.julia/compiled/v1.11/AdaptSparseArraysExt/7qfxl_2wbLs.so
/home/leios/.julia/compiled/v1.11/SparseArraysExt/TR6ym_NwWdq.so
/home/leios/.julia/compiled/v1.11/SparseArraysExt/k0lPI_4Hzk0.so
/home/leios/.julia/compiled/v1.11/ArrayInterfaceSparseArraysExt/3nhwj_rURdT.so
/home/leios/.julia/compiled/v1.11/AbstractFFTs/Di3HZ_4Hzk0.so
/home/leios/.julia/compiled/v1.11/AMDGPU/arpZD_rURdT.so
/opt/rocm/lib/libamdhip64.so
/opt/rocm/lib/librocprofiler-register.so.0
/opt/rocm/lib/libamd_comgr.so.2
/opt/rocm/lib/libhsa-runtime64.so.1
/usr/lib/libnuma.so.1
/usr/lib/libfmt.so.11
/usr/lib/libglog.so.2
/usr/lib/libzstd.so.1
/usr/lib/libncursesw.so.6
/usr/lib/libelf.so.1
/opt/rocm/lib/libhsakmt.so.1
/usr/lib/libdrm.so.2
/usr/lib/libgflags.so.2.2
/usr/lib/libdrm_amdgpu.so.1
/opt/rocm/lib/libhsa-amd-aqlprofile64.so
vchuravy@odin ~> pacman -Qe | grep amd
amd-ucode 20250408.c1a774f3-1
amdmemorytweak-git 40.9a64ff1-1
amdvlk 2025.Q1.3-1
hip-runtime-amd 6.3.3-1
xf86-video-amdgpu 23.0.0-2
vchuravy@odin ~> pacman -Qe | grep roc
python-zeroconf 0.146.1-1
rocm-hip-sdk 6.3.3-1
roctracer 6.3.3-1
So it seems I am on 6.3 and you are likely on 6.0?
6.2.4
I'll update?
[leios@noema ~]$ pacman -Qe | grep amd
amd-ucode 20250210.5bc5868b-1
amdvlk 2024.Q4.3-1
hsa-amd-aqlprofile-bin 6.2.4-1
xf86-video-amdgpu 23.0.0-2
[leios@noema ~]$ pacman -Qe | grep roc
rocblas 6.2.4-1
rocfft 6.2.4-1
rocm-hip-runtime 6.2.2-1
rocm-hip-sdk 6.2.2-1
rocm-opencl-sdk 6.2.2-1
rocm-smi-lib 6.2.4-1
rocrand 6.2.4-1
rocsolver 6.2.4-1
rocsparse 6.2.4-1
Good news, everything's still broken on 6.4 for me
[leios@noema ~]$ pacman -Qe | grep amd
amd-ucode 20250508.788aadc8-2
amdvlk 2025.Q2.1-1
hsa-amd-aqlprofile-bin 6.4.0-1
xf86-video-amdgpu 23.0.0-2
[leios@noema ~]$ pacman -Qe | grep roc
rocblas 6.4.0-1
rocfft 6.4.0-1
rocm-hip-runtime 6.4.0-1
rocm-hip-sdk 6.4.0-1
rocm-opencl-sdk 6.4.0-1
rocm-smi-lib 6.4.0-1
rocrand 6.4.0-1
rocsolver 6.4.0-1
rocsparse 6.4.0-1
[leios@noema ~]$ julia -t 2
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.3 (2025-01-21)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using AMDGPU
julia> AMDGPU.zeros(10)
10-element ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
julia: /usr/src/debug/hip-runtime/hip-runtime-clr/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
[1364] signal 6 (-6): Aborted
in expression starting at none:0
unknown function (ip: 0x7fa23f6b374c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7fa23f6414e2)
unknown function (ip: 0x7fa1c0e660bb)
unknown function (ip: 0x7fa1c11437d3)
unknown function (ip: 0x7fa1c0ea5feb)
macro expansion at /home/leios/.julia/packages/GPUToolbox/cZlg7/src/ccalls.jl:143 [inlined]
macro expansion at /home/leios/.julia/packages/AMDGPU/STpZC/src/utils.jl:122 [inlined]
hipGetDeviceCount at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/libhip.jl:42 [inlined]
ndevices at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/device.jl:103
TaskLocalState at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:11 [inlined]
TaskLocalState at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:11
TaskLocalState at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:11 [inlined]
#25 at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:27 [inlined]
get! at ./iddict.jl:171
task_local_state! at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:26
prepare_state at /home/leios/.julia/packages/AMDGPU/STpZC/src/tls.jl:193 [inlined]
hipStreamQuery at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/libhip.jl:113 [inlined]
#11 at /home/leios/.julia/packages/AMDGPU/STpZC/src/hip/stream.jl:114
unknown function (ip: 0x7fa2375f646f)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 23885857 (Pool: 23885222; Big: 635); GC: 16
Aborted (core dumped)
Uhhh.
Works now.
I didn't change anything on my end, but I guess I am closing this for now?
Reopening because it's still happening, but now seemingly at random. I can't really figure out how to create a MWE because it'll work sometimes and not at other times.
An example: https://youtube.com/clip/UgkxRc957OTXTE5V5cFjtL4to-w02-kP2vmf?feature=shared
This one is actually closer to #690, but I had my exports set export UCX_ERROR_SIGNALS="SIGILL,SIGBUS,SIGFPE", which had solved the issue previously.
(also, yes, the video's unlisted precisely because I got a little animated)
UCX_ERROR_SIGNALS is only applicable to MPI based libraries that use the UCX library.
I agree that the error looks similar to that, I assume this is Julia 1.11?
The backtrace is kinda weird... It looks like you are encountering a segmentation fault in https://github.com/JuliaLang/julia/blob/760b2e5b7396f9cc0da5efce0cadd5d1974c4069/src/jlapi.c#L740 so ct-> might be an issue here, but that would mean something corrupted Julia's task-local state since we just loaded the jl_current_task from it...
Should I create another issue somewhere else? I am happy to do so next time I get this error so we can get more info.
Again, like in #690, I was not explicitly loading MPI libraries
Ah, for the record, the UCX issue is only one of the segfaults. I still get the one associated with this issue regularly and the one for versioninfo(). I still don't know how to diagnose this locally as it seems to appear more or less at random
Woops. Didn't mean to close it and don't have permission to reopen
The original error looks like: https://github.com/ROCm/clr/issues/36
I've seen this with debug ROCm build.
I've had this issue and I solved this by installing ROCm via AUR instead of pacman.
So i've deleted every ROCm package I had with pacman -Rns (and write every ROCm/HIP/miopen package you have)
And then I installed ROCm via yay -S opencl-amd-dev which installs everything back. I didn't know that.
Now everything works like a charm.