Yggdrasil
Yggdrasil copied to clipboard
[New Package] Add rocBLAS 4.2.0
I'm guessing the glibc dlopen failure might be due to usage of AVX2?
amdci7 should support AVX2, and an ISA issue should probably throw a SIGILL error, not a segmentation fault
Ok will try to track this down on amdci2
Ah shoot the issue was not to be able to dlopen
gdb points to a crash during library initialization, so I guess I should be on the lookout for "fancy" things that rocBLAS is trying to do during init.
In case this is familiar to anyone:
#0 0xffffffffffffffff in ?? ()
#1 0x00007ffff7de38f3 in call_init (env=0x9fb0c0, argv=0x7fffffffde88, argc=4, l=<optimized out>) at dl-init.c:72
#2 _dl_init (main_map=main_map@entry=0xef7a60, argc=4, argv=0x7fffffffde88, env=0x9fb0c0) at dl-init.c:119
#3 0x00007ffff7de83bf in dl_open_worker (a=a@entry=0x7fffffff97a0) at dl-open.c:522
#4 0x00007ffff77261ef in __GI__dl_catch_exception (exception=0x7fffffff9780, operate=0x7ffff7de7f80 <dl_open_worker>, args=0x7fffffff97a0) at dl-error-skeleton.c:196
#5 0x00007ffff7de798a in _dl_open (file=0x7fffffff9ae0 "/home/jsamaroo/.julia/artifacts/00c0f592384c05d60db60eeba737077341004203/rocblas/lib/librocblas.so",
mode=-2147483639, caller_dlopen=0x7ffff6a36cc9 <jl_load_dynamic_library+601>, nsid=<optimized out>, argc=4, argv=<optimized out>, env=0x9fb0c0) at dl-open.c:605
#6 0x00007ffff7bcff96 in dlopen_doit (a=a@entry=0x7fffffff99d0) at dlopen.c:66
#7 0x00007ffff77261ef in __GI__dl_catch_exception (exception=exception@entry=0x7fffffff9970, operate=0x7ffff7bcff40 <dlopen_doit>, args=0x7fffffff99d0)
at dl-error-skeleton.c:196
#8 0x00007ffff772627f in __GI__dl_catch_error (objname=0x602270, errstring=0x602278, mallocedp=0x602268, operate=<optimized out>, args=<optimized out>)
at dl-error-skeleton.c:215
#9 0x00007ffff7bd0745 in _dlerror_run (operate=operate@entry=0x7ffff7bcff40 <dlopen_doit>, args=args@entry=0x7fffffff99d0) at dlerror.c:162
#10 0x00007ffff7bd0051 in __dlopen (file=file@entry=0x7fffffff9ae0 "/home/jsamaroo/.julia/artifacts/00c0f592384c05d60db60eeba737077341004203/rocblas/lib/librocblas.so",
mode=<optimized out>) at dlopen.c:87
#11 0x00007ffff6a36a69 in jl_dlopen (
filename=filename@entry=0x7fffffff9ae0 "/home/jsamaroo/.julia/artifacts/00c0f592384c05d60db60eeba737077341004203/rocblas/lib/librocblas.so", flags=flags@entry=68)
at /buildworker/worker/package_linux64/build/src/dlload.c:123
#12 0x00007ffff6a36cc9 in jl_load_dynamic_library (
modname=0x7fffed006598 "/home/jsamaroo/.julia/artifacts/00c0f592384c05d60db60eeba737077341004203/rocblas/lib/librocblas.so", flags=<optimized out>, throw_err=1)
at /buildworker/worker/package_linux64/build/src/dlload.c:267
#13 0x00007fffe2b2a69d in julia_#dlopen#3_21642 () at libdl.jl:117
#14 0x00007fffe2c0d3bf in dlopen () at libdl.jl:117
The latest commit enables building with Tensile for two reasons:
- We'll need it for generating competitive BLAS kernels
- It might fix the segfault we're seeing (I would bet that AMD doesn't test rocBLAS builds without Tensile)
@haampie if you get the chance, I would appreciate if you could give some insight into why this build is failing. The ASM being compiled looks valid to me, even though the compiler disagrees.
I have never dlopen'ed rocblas.so, so I'm afraid I can't help out :( isn't Tensile required to actually get blas 3 kernels at all?
By the way, hipcc inlines everything by default, but that can be disabled: https://github.com/ROCm-Developer-Tools/HIP/blob/37cb3a34938af39303b73aceb2d7803f5c7ca7ca/bin/hipcc#L522-L525 maybe worth trying?
Somehow, this PR has processes that are still running on the Yggdrasil workers. They all look like:
python3 /workspace/srcdir/rocBLAS-rocm-4.2.0/build/virtualenv/lib/python3.8/site-packages/Tensile/bin/TensileCreateLibrary --merge-files --no-short-file-names --no-library-print-debug --architecture=gfx900 --code-object-version=V3 --cxx-compiler=hipcc --library-format=msgpack /workspace/srcdir/rocBLAS-rocm-4.2.0/library/src/blas3/Tensile/Logic/asm_full /workspace/srcdir/rocBLAS-rocm-4.2.0/build/Tensile HIP
Somehow, they aren't dying properly. I've restarted the agents, but you should be aware that somehow this is causing problems.
Running LD_DEBUG=all julia -e "Libc.Libdl.dlopen(\"./librocblas.so\")" gives a little bit more info: https://drive.google.com/file/d/1qqOaUzqtnPjNcAitHX9nU7ajJwtys-D7/view?usp=sharing
There are couple errors like this, although I'm not sure how important they are:
212944: /home/asmirnov/julia-1.7.3/bin/../lib/julia/libopenblas64_.so: error: symbol lookup error: undefined symbol: isamax_ (fatal)
But the whole process ends in a bit after 212944: calling init: ./librocblas.so:
212944: calling init: ./librocblas.so
212944:
212944: symbol=__cxa_guard_acquire; lookup in file=./librocblas.so [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libgcc_s.so.1 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/home/asmirnov/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/x86_64-linux-gnu-libgfortran5-cxx11/destdir/rocblas/lib/./../../hip/lib/libamdhip64.so.4 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libz.so.1 [0]
212944: symbol=__cxa_guard_acquire; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libstdc++.so.6 [0]
212944: binding file /home/asmirnov/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/x86_64-linux-gnu-libgfortran5-cxx11/destdir/rocblas/lib/./../../hip/lib/libamdhip64.so.4 [0] to /home/asmirnov/julia-1.7.3/bin/../lib/julia/libstdc++.so.6 [0]: normal symbol `__cxa_guard_acquire' [CXXABI_1.3]
212944: symbol=getenv; lookup in file=./librocblas.so [0]
212944: symbol=getenv; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libgcc_s.so.1 [0]
212944: symbol=getenv; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
212944: symbol=getenv; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
212944: symbol=getenv; lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
212944: symbol=getenv; lookup in file=/home/asmirnov/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/x86_64-linux-gnu-libgfortran5-cxx11/destdir/rocblas/lib/./../../hip/lib/libamdhip64.so.4 [0]
212944: symbol=getenv; lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
212944: symbol=getenv; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libz.so.1 [0]
212944: symbol=getenv; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libstdc++.so.6 [0]
212944: symbol=getenv; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
212944: binding file /home/asmirnov/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/x86_64-linux-gnu-libgfortran5-cxx11/destdir/rocblas/lib/./../../hip/lib/libamdhip64.so.4 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `getenv' [GLIBC_2.2.5]
212944: symbol=__cxa_guard_release; lookup in file=./librocblas.so [0]
212944: symbol=__cxa_guard_release; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libgcc_s.so.1 [0]
212944: symbol=__cxa_guard_release; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
212944: symbol=__cxa_guard_release; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
212944: symbol=__cxa_guard_release; lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
212944: symbol=__cxa_guard_release; lookup in file=/home/asmirnov/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/x86_64-linux-gnu-libgfortran5-cxx11/destdir/rocblas/lib/./../../hip/lib/libamdhip64.so.4 [0]
212944: symbol=__cxa_guard_release; lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
212944: symbol=__cxa_guard_release; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libz.so.1 [0]
212944: symbol=__cxa_guard_release; lookup in file=/home/asmirnov/julia-1.7.3/bin/../lib/julia/libstdc++.so.6 [0]
212944: binding file /home/asmirnov/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/x86_64-linux-gnu-libgfortran5-cxx11/destdir/rocblas/lib/./../../hip/lib/libamdhip64.so.4 [0] to /home/asmirnov/julia-1.7.3/bin/../lib/julia/libstdc++.so.6 [0]: normal symbol `__cxa_guard_release' [CXXABI_1.3]
signal (11): Segmentation fault
in expression starting at none:1
unknown function (ip: (nil))
Allocations: 2721 (Pool: 2711; Big: 10); GC: 0
Here's also readelf output.
$ readelf -d librocblas.so
Dynamic section at offset 0x1aae680 contains 37 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libamdhip64.so.4]
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
0x000000000000000e (SONAME) Library soname: [librocblas.so.0]
0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/../../lib:$ORIGIN/../../hip/lib]
0x000000000000000c (INIT) 0x41000
0x000000000000000d (FINI) 0x6a9c80
0x0000000000000019 (INIT_ARRAY) 0x1aa3668
0x000000000000001b (INIT_ARRAYSZ) 3176 (bytes)
0x000000000000001a (FINI_ARRAY) 0x1aa42d0
0x000000000000001c (FINI_ARRAYSZ) 16 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x1ad5158
0x0000000000000005 (STRTAB) 0x1ac3000
0x0000000000000006 (SYMTAB) 0x1e70
0x000000000000000a (STRSZ) 74070 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000003 (PLTGOT) 0x1aaf910
0x0000000000000002 (PLTRELSZ) 15336 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x3cee0
0x0000000000000007 (RELA) 0x1cd98
0x0000000000000008 (RELASZ) 131400 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000000000001e (FLAGS) BIND_NOW
0x000000006ffffffb (FLAGS_1) Flags: NOW
0x000000006ffffffe (VERNEED) 0x1cb48
0x000000006fffffff (VERNEEDNUM) 7
0x000000006ffffff0 (VERSYM) 0x1c086
0x000000006ffffff9 (RELACOUNT) 4599
0x0000000000000000 (NULL) 0x0
@pxl-th dumping the INIT_ARRAY contents may also be interesting, because the thing dlopen trips on is a null first entry in that array (determined via gdb).
.init section for rocblas 4.2 (binarybuilder): download
asmirnov@amdjl:~/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/destdir/rocblas/lib$ readelf -x .init librocblas.so
Hex dump of section '.init':
0x00041000 4883ec08 e89b6100 00e8ad61 00004883 H.....a....a..H.
0x00041010 c408c3 ...
.init_array section for rocblas 4.2 (binarybuilder): download
~/code/rocm-bb/build/x86_64-linux-gnu-cxx11/5LJl6hOi/destdir/rocblas/lib$ readelf -x .init_array librocblas.so
Hex dump of section '.init_array':
0x01aa3668 ffffffff ffffffff d0380400 00000000 .........8......
.init section for rocblas 5.0 (system-wide installation): download
asmirnov@amdjl:/opt/rocm/rocblas/lib$ readelf -x .init librocblas.so
Hex dump of section '.init':
0x114e059c 4883ec08 488b0549 23010048 85c07402 H...H..I#..H..t.
0x114e05ac ffd04883 c408c3 ..H....
.init_array section for rocblas 5.0 (system-wide installation): download
asmirnov@amdjl:/opt/rocm/rocblas/lib$ readelf -x .init_array librocblas.so
Hex dump of section '.init_array':
0x114e3bb0 00000000 00000000 00000000 00000000 ................
For some reason, when dumping .init_array section via objdump gives empty results
Backtrace of gdb --args julia -e "Libc.Libdl.dlopen(\"./librocblas.so\")" from the binary builder:
(gdb) bt full
#0 0xffffffffffffffff in ?? ()
No symbol table info available.
#1 0x00007ffff7fc947e in call_init (l=<optimized out>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdd48, env=env@entry=0x8468c0) at ./elf/dl-init.c:70
j = 0
jm = <optimized out>
addrs = <optimized out>
init_array = <optimized out>
__PRETTY_FUNCTION__ = "call_init"
#2 0x00007ffff7fc9568 in call_init (env=0x8468c0, argv=0x7fffffffdd48, argc=3, l=<optimized out>) at ./elf/dl-init.c:33
init_array = <optimized out>
__PRETTY_FUNCTION__ = "call_init"
j = <optimized out>
jm = <optimized out>
addrs = <optimized out>
#3 _dl_init (main_map=0xa53b20, argc=3, argv=0x7fffffffdd48, env=0x8468c0) at ./elf/dl-init.c:117
preinit_array = <optimized out>
preinit_array_size = <optimized out>
i = <optimized out>
#4 0x00007ffff7eeac85 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:182
old = <optimized out>
errcode = 0
c = {exception = 0x7fffffff9530, errcode = 0x7fffffff943c, env = {{__jmpbuf = {140737488328400, -548524226938750696, -16, 140737488327984, 3, 2147483657,
-548524226984888040, -548506505144125160}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 16 times>}}}}}
old = <optimized out>
#5 0x00007ffff7fd0ff6 in dl_open_worker (a=0x7fffffff96d0) at ./elf/dl-open.c:808
init_args = {new = 0xa53b20, argc = 3, argv = 0x7fffffffdd48, env = 0x8468c0}
args = <optimized out>
mode = -2147483639
new = 0xa53b20
args = <optimized out>
mode = <optimized out>
new = <optimized out>
ex = <optimized out>
err = <optimized out>
init_args = <optimized out>
#6 dl_open_worker (a=a@entry=0x7fffffff96d0) at ./elf/dl-open.c:771
args = 0x7fffffff96d0
mode = <optimized out>
new = <optimized out>
init_args = <optimized out>
#7 0x00007ffff7eeac28 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
errcode = 0
c = {exception = 0x7fffffff96b0, errcode = 0x7fffffff95ac, env = {{__jmpbuf = {-2, -548524226938750696, -16, 140737354127944, 3, 2147483657, -548524227033122536,
-548506505144125160}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 16 times>}}}}}
old = 0x7fffffff97b0
#8 0x00007ffff7fd134e in _dl_open (file=<optimized out>, mode=-2147483639, caller_dlopen=0x7ffff6ce5cc9 <jl_load_dynamic_library+601>, nsid=-2, argc=3, argv=<optimized out>,
env=0x8468c0) at ./elf/dl-open.c:883
args = {file = 0x7fffffff9a50 "./librocblas.so", mode = -2147483639, caller_dlopen = 0x7ffff6ce5cc9 <jl_load_dynamic_library+601>, map = 0xa53b20, nsid = 0,
original_global_scope_pending_adds = 0, libc_already_loaded = true, worker_continue = true, argc = 3, argv = 0x7fffffffdd48, env = 0x8468c0}
exception = {objname = 0x0, errstring = 0x7fffffff9a50 "./librocblas.so", message_buffer = 0x7fffffffaa4f ""}
errcode = <optimized out>
__PRETTY_FUNCTION__ = "_dl_open"
#9 0x00007ffff7e066bc in dlopen_doit (a=a@entry=0x7fffffff9940) at ./dlfcn/dlopen.c:56
args = 0x7fffffff9940
#10 0x00007ffff7eeac28 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffff98a0, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
errcode = 0
c = {exception = 0x7fffffff98a0, errcode = 0x7fffffff97ac, env = {{__jmpbuf = {140737488328951, -548524226882127592, -16, 140737154187376, 68, 140737155548440, -548524226966013672, -548506505144125160}, __mask_was_saved = 0, __saved_mask = {__val = {140737351494352, 4287062190, 140737351489484, 10647808, 140737353934825, 1808, 140737351566928, 140737353858448, 140737488328936, 140737488328932, 12234695464575742208, 140737351566928, 140737488329296, 140737488337536, 1, 140737154187376}}}}}
old = 0x0
#11 0x00007ffff7eeacf3 in __GI__dl_catch_error (objname=0x7fffffff98f8, errstring=0x7fffffff9900, mallocedp=0x7fffffff98f7, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227
exception = {objname = 0x3000000018 <error: Cannot access memory at address 0x3000000018>, errstring = 0x7fffffff9980 "", message_buffer = 0x7fffffff98c0 "\250\377\377\377\377\377\377\377"}
errorcode = <optimized out>
#12 0x00007ffff7e061ae in _dlerror_run (operate=operate@entry=0x7ffff7e06660 <dlopen_doit>, args=args@entry=0x7fffffff9940) at ./dlfcn/dlerror.c:138
result = <optimized out>
objname = 0x7ffff6de2f28 <ptrhash_get+56> "H\215{\377\061\311L\215E\377H!\370H\215<"
errstring = 0x7fffec150070 "\220\272\377\377\377\177"
malloced = false
errcode = <optimized out>
#13 0x00007ffff7e06748 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=0x7fffffff9a50 "./librocblas.so") at ./dlfcn/dlopen.c:71
args = {file = 0x7fffffff9a50 "./librocblas.so", mode = 9, new = 0x1, caller = 0x7ffff6ce5cc9 <jl_load_dynamic_library+601>}
#14 ___dlopen (file=file@entry=0x7fffffff9a50 "./librocblas.so", mode=<optimized out>) at ./dlfcn/dlopen.c:81
No locals.
#15 0x00007ffff6ce5a69 in jl_dlopen (filename=filename@entry=0x7fffffff9a50 "./librocblas.so", flags=flags@entry=68) at /buildworker/worker/package_linux64/build/src/dlload.c:123
No locals.
#16 0x00007ffff6ce5cc9 in jl_load_dynamic_library (modname=0x7fffec29c518 "./librocblas.so", flags=<optimized out>, throw_err=1) at /buildworker/worker/package_linux64/build/src/dlload.c:267
ext = 0x7ffff6e3e159 ""
path = "./librocblas.so\000\220\256\377\377\377\177\000\000H\335\377\377\377\177\000\000\300h\204\000\000\000\000\000\256\215\375\367\377\177", '\000' <repeats 19 times>, "Bi\000\000\000\000\000\000\277\227\242\377\177\000\000\316\302n\242\377\177\000\000@y\242", '\000' <repeats 45 times>, "\240\037\000\000\377\377\002", '\000' <repeats 105 times>...
relocated = "\001\000\000\000\377\177\000\000\060$\245\000\000\000\000\000\060\252\377\377\377\177\000\000|l\374\367\377\177\000\000\001\000\000\000\377\177\000\000\320\036\245\000\000\000\000\000P\252\377\377\377\177\000\000|l\374\367\377\177\000\000\001\000\000\000\377\177\000\000\060\022\245\000\000\000\000\000p\252\377\377\377\177\000\000|l\374\367\377\177\000\000\001\000\000\000\000\000\000\000\020\031\245\000\000\000\000\000\220\252\377\377\377\177\000\000\003\000\000\000\000\000\000\000\000\373(\000\000\000\000\000\240\266\377\377\377\177\000\000\300\030\245\000\000\000\000\000\006\000\000\000\000\000\000\000@\000\240\233\377\177\000\000\260\252\377\377\377\177\000\000\000\000\000\000\000\000\000\000\300\253\377\377\377\177\000\000\220\251\377\377\377\177\000\000\000\000\000\000"...
i = 0
stbuf = {st_dev = 140737351537744, st_mode = 140737353858448, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, st_ino = 0, st_size = 0, st_blksize = 0, st_blocks = 19, st_flags = 10646368, st_gen = 140735921081752, st_atim = {tv_sec = 8677568, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 140737353965425}, st_ctim = {tv_sec = 5, tv_nsec = 0}, st_birthtim = {tv_sec = 140737351537744, tv_nsec = 140737351756656}}
handle = <optimized out>
abspath = <optimized out>
is_atpath = 0
n_extensions = <optimized out>
#17 0x00007fffe22490dd in julia_#dlopen#3_30656 () at libdl.jl:117
No locals.
#18 0x00007fffe2248e7f in dlopen () at libdl.jl:117
No locals.
#19 julia_dlopen_30644 () at libdl.jl:117
No locals.
#20 0x00007fffe2248ef8 in jfptr_dlopen_30645.clone_1 () from /home/pxl-th/bin/julia-1.7.2/lib/julia/sys.so
No symbol table info available.
#21 0x00007ffff6cc4e0a in _jl_invoke (world=31320, mfunc=<optimized out>, nargs=1, args=0x7fffffffbc38, F=0x7fffe5778ed0 <jl_system_image_data+41475088>) at /buildworker/worker/package_linux64/build/src/gf.c:2247
last_alloc = <optimized out>
invoke = <optimized out>
codeinst = <optimized out>
last_errno = <optimized out>
res = 0x7fff9a9cc668 <__CTOR_LIST__>
codeinst = <optimized out>
last_alloc = <optimized out>
last_errno = <optimized out>
invoke = <optimized out>
res = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
invoke = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
res = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
#22 jl_apply_generic (F=<optimized out>, args=0x7fffffffbc38, nargs=<optimized out>) at /buildworker/worker/package_linux64/build/src/gf.c:2429
world = 31320
mfunc = <optimized out>
#23 0x00007ffff6ce3e96 in jl_apply (nargs=2, args=0x7fffffffbc30) at /buildworker/worker/package_linux64/build/src/julia.h:1788
No locals.
#24 do_call (args=args@entry=0x7fffec2361b8, nargs=nargs@entry=2, s=s@entry=0x7fffffffbec0) at /buildworker/worker/package_linux64/build/src/interpreter.c:126
argv = 0x7fffffffbc30
i = <optimized out>
result = <optimized out>
#25 0x00007ffff6ce390e in eval_value (e=e@entry=0x7fffec29c750, s=s@entry=0x7fffffffbec0) at /buildworker/worker/package_linux64/build/src/interpreter.c:215
src = <optimized out>
ex = <optimized out>
args = 0x7fffec2361b8
nargs = 2
head = <optimized out>
#26 0x00007ffff6ce46d2 in eval_stmt_value (s=0x7fffffffbec0, stmt=<optimized out>) at /buildworker/worker/package_linux64/build/src/interpreter.c:166
res = <optimized out>
res = <optimized out>
#27 eval_body (stmts=<optimized out>, s=s@entry=0x7fffffffbec0, ip=2, ip@entry=0, toplevel=toplevel@entry=1) at /buildworker/worker/package_linux64/build/src/interpreter.c:587
head = 0x7ffff022a740
stmt = <optimized out>
next_ip = 3
__eh = {eh_ctx = {{__jmpbuf = {140737154187376, 0, 140737002174304, 6933174, -548524225676265192, -548504045595870952, -548524227720904704, -548504046106003176}, __mask_was_saved = 0, __saved_mask = {__val = {140737338629216, 5, 4294967291, 0, 140737002174304, 6933174, 140737335086248, 140737155089008, 140737338629216, 140737488338496, 140737335086608, 4, 140733193388080, 140737488338576, 140737488338256, 140737488339328}}}}, gcstack = 0x0, prev = 0x0, gc_state = 1 '\001', locks_len = 6933174, defer_signal = 31320, timing_stack = 0x7ffff7135c60 <jl_ast_main_ctx>, world_age = 140737334009073}
ns = <optimized out>
ct = <optimized out>
#28 0x00007ffff6ce52f8 in jl_interpret_toplevel_thunk (m=m@entry=0x7fffe3057760 <jl_system_image_data+443552>, src=0x7fffec294190) at /buildworker/worker/package_linux64/build/src/interpreter.c:731
s = 0x7fffffffbec0
nroots = <optimized out>
stmts = <optimized out>
ct = 0x7fffec150010
last_age = 31320
r = <optimized out>
#29 0x00007ffff6d027a4 in jl_toplevel_eval_flex (m=m@entry=0x7fffe3057760 <jl_system_image_data+443552>, e=<optimized out>, fast=fast@entry=1, expanded=expanded@entry=0) at /buildworker/worker/package_linux64/build/src/toplevel.c:885
ct = 0x7fffec150010
ex = 0x7fffec29c610
mfunc = 0x0
thk = 0x7fffec294190
__gc_stkf = {0xd, 0x7fffffffc0f0, 0x7fffffffbfb8, 0x7fffffffbfc0, 0x7fffffffbfb0}
last_age = <optimized out>
head = <optimized out>
has_intrinsics = 0
has_defs = 0
has_loops = <optimized out>
has_opaque = 0
result = <optimized out>
#30 0x00007ffff6d029e5 in jl_toplevel_eval_flex (m=m@entry=0x7fffe3057760 <jl_system_image_data+443552>, e=e@entry=0x7fffec29c470, fast=fast@entry=1, expanded=expanded@entry=0) at /buildworker/worker/package_linux64/build/src/toplevel.c:830
res = <optimized out>
i = <optimized out>
ct = 0x7fffec150010
ex = 0x7fffec29c470
mfunc = 0x0
thk = 0x0
__gc_stkf = {0xd, 0x7fffffffca80, 0x7fffffffc0a8, 0x7fffffffc0b0, 0x7fffffffc0a0}
last_age = <optimized out>
head = <optimized out>
has_intrinsics = -15896
has_defs = 0
has_loops = <optimized out>
has_opaque = -267013920
result = <optimized out>
#31 0x00007ffff6d0450c in jl_toplevel_eval (m=m@entry=0x7fffe3057760 <jl_system_image_data+443552>, v=v@entry=0x7fffec29c470) at /buildworker/worker/package_linux64/build/src/toplevel.c:894
No locals.
#32 0x00007ffff6d0462a in jl_toplevel_eval_in (m=0x7fffe3057760 <jl_system_image_data+443552>, ex=0x7fffec29c470) at /buildworker/worker/package_linux64/build/src/toplevel.c:944
ct = <optimized out>
v = <optimized out>
last_lineno = 0
last_filename = 0x7ffff6e0e0aa "none"
i__tr = 1
i__ca = <optimized out>
__eh = {eh_ctx = {{__jmpbuf = {140737488339328, -548524225458161384, 2, 140737221308424, 140737155127504, 2, -548524225552533224, -548504292363814632}, __mask_was_saved = 0, __saved_mask = {__val = {140737347929461, 140737488339600, 12234695464575742208, 140737155548272, 140737488339504, 140736997135696, 140737334107794, 12, 140736997135696, 2, 140737221308424, 140737155548272, 2, 140737488339568, 140737333441778, 140737488339536}}}}, gcstack = 0x7fffffffca80, prev = 0x7fffffffd740, gc_state = 0 '\000', locks_len = 0, defer_signal = 0, timing_stack = 0x7fffffffd640, world_age = 31320}
__excstack_state = <optimized out>
#33 0x00007fffe2acf7e8 in eval () at boot.jl:373
No locals.
#34 julia_exec_options_33549 () at client.jl:268
No locals.
#35 0x00007fffe258a0f8 in julia__start_38731 () at client.jl:495
No locals.
#36 0x00007fffe258a269 in jfptr.start_38732.clone_1 () from /home/pxl-th/bin/julia-1.7.2/lib/julia/sys.so
No symbol table info available.
#37 0x00007ffff6cc4e0a in _jl_invoke (world=31320, mfunc=<optimized out>, nargs=0, args=0x7fffffffd990, F=0x7fffe3c527c0 <jl_system_image_data+13006080>) at /buildworker/worker/package_linux64/build/src/gf.c:2247
last_alloc = <optimized out>
invoke = <optimized out>
codeinst = <optimized out>
last_errno = <optimized out>
res = 0x7fff9a9cc668 <__CTOR_LIST__>
codeinst = <optimized out>
last_alloc = <optimized out>
last_errno = <optimized out>
invoke = <optimized out>
res = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
invoke = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
res = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
__atomic_load_ptr = <optimized out>
__atomic_load_tmp = <optimized out>
#38 jl_apply_generic (F=<optimized out>, args=0x7fffffffd990, nargs=<optimized out>) at /buildworker/worker/package_linux64/build/src/gf.c:2429
world = 31320
mfunc = <optimized out>
#39 0x00007ffff6d282d6 in jl_apply (nargs=1, args=0x7fffffffd988) at /buildworker/worker/package_linux64/build/src/julia.h:1788
No locals.
#40 true_main (argc=<optimized out>, argv=<optimized out>) at /buildworker/worker/package_linux64/build/src/jlapi.c:559
ct = 0x7fffec150010
last_age = 1
i__tr = 1
i__ca = 1
__eh = {eh_ctx = {{__jmpbuf = {140737488345536, -548524224696895208, 0, 140737488346464, 0, 140737354125376, -548524224747226856, -548504273189816040}, __mask_was_saved = 0, __saved_mask = {__val = {17898239780003166488, 0, 140737333972490, 140737334387600, 0 <repeats 12 times>}}}}, gcstack = 0x0, prev = 0x0, gc_state = 0 '\000', locks_len = 0, defer_signal = 0, timing_stack = 0x0, world_age = 1}
__excstack_state = <optimized out>
start_client = 0x7fffe3c527c0 <jl_system_image_data+13006080>
#41 0x00007ffff6d28c7d in jl_repl_entrypoint (argc=<optimized out>, argv=<optimized out>) at /buildworker/worker/package_linux64/build/src/jlapi.c:701
lisp_prompt = <optimized out>
orig_argv = <optimized out>
ret = <optimized out>
#42 0x00000000004007d9 in main (argc=<optimized out>, argv=<optimized out>) at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
ret = <optimized out>
@jpsamaroo does #0 0xffffffffffffffff in ?? () at the top mean what you thought, that it tries to execute -1 while it should ignore it?
@pxl-th I believe that is the case, it tries to jump to the -1 address and segfaults.
Ok, I think I have an idea of what the issue is. It appears that we're mixing up some conventions for how musl vs. glibc do constructors, where musl appears to use -1 as a sentinel for "end of ctors list", while glibc uses 0 for the same purpose. I have no idea why a -1 got inserted when there are ctors to run, but it must be related to link ordering, where somehow the -1 (which should be at the end to signal completion) ended up at the front. I would guess that we accidentally linked both the ctor implementation for musl and glibc (in that order probably). This is probably an issue with how I patched hipcc in HIP_jll.
What's odd is that I still see this behavior in the musl build, where I wouldn't expect to see the terminator be 0 (I would expect -1).
Superseded by #5441