libcxxwrap-julia icon indicating copy to clipboard operation
libcxxwrap-julia copied to clipboard

Crash in Julia 1.5 and later

Open fingolfin opened this issue 4 years ago • 5 comments

(This comes from comments on https://github.com/JuliaPackaging/Yggdrasil/issues/2160 but I think it deserves its own full issue):

There are at least two issues with libcxxwrap_julia_jll v0.8.5 in Julia 1.5 and later

  1. It fails to load on macOS with error Symbol not found: __Unwind_Resume ; this is addressed in https://github.com/JuliaPackaging/Yggdrasil/pull/2190 ; see also https://github.com/JuliaPackaging/Yggdrasil/pull/2199 and

  2. On Linux there is a segfault; this segfault can also be seen on this repo in the nightly CI tests, which fail with the same error (the CI tests here only test with Julia 1.4 and nightly, it might be useful to also test 1.5 there?). This is a backtrace:

...
Running tests from containers.jl...

signal (11): Segmentation fault
in expression starting at /home/mhorn/.julia/packages/CxxWrap/ZOkSN/test/containers.jl:21
_ZN5jlcxx6detail11CallFunctorINS_10ConstArrayIdLl1EEEJEE5applyEPKv at /home/mhorn/.julia/artifacts/860a8b2216bd059600ed7c44cdaa3bb81b23ff1c/lib/libjlcxx_containers.so (unknown line)
const_vector at /home/mhorn/.julia/packages/CxxWrap/ZOkSN/src/CxxWrap.jl:590
macro expansion at /home/mhorn/.julia/packages/CxxWrap/ZOkSN/test/containers.jl:29 [inlined]
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115 [inlined]
top-level scope at /home/mhorn/.julia/packages/CxxWrap/ZOkSN/test/containers.jl:23
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:834
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:913
jl_load_rewrite at /buildworker/worker/package_linux64/build/src/toplevel.c:914
include at ./client.jl:457
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:117
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:206
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:157 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:566
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:492
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:492
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:660
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:840
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:913
jl_load_rewrite at /buildworker/worker/package_linux64/build/src/toplevel.c:914
include at ./client.jl:457
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:117
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:206
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:157 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:566
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:660
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:840
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:790
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:883
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
exec_options at ./client.jl:272
_start at ./client.jl:506
jfptr__start_53898.clone_1 at /home/mhorn/julia-1.5.3/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/ui/../src/julia.h:1690 [inlined]
true_main at /buildworker/worker/package_linux64/build/ui/repl.c:106
main at /buildworker/worker/package_linux64/build/ui/repl.c:227
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/mhorn/julia-1.5.3/bin/julia (unknown line)
Allocations: 16112006 (Pool: 16107521; Big: 4485); GC: 15

Applying c++filt to _ZN5jlcxx6detail11CallFunctorINS_10ConstArrayIdLl1EEEJEE5applyEPKv gives jlcxx::detail::CallFunctor<jlcxx::ConstArray<double, 1l>>::apply(void const*).

It might be helpful to attack this with rr, but my early attempts failed due to a lack of instructions, and then I had to work on other stuff.

fingolfin avatar Dec 07 '20 11:12 fingolfin

I did some debugging but I still don't know why that happens but it does offer some workarounds, i.e. setting -DCMAKE_CXX_FLAGS_RELEASE="-O2".

I tried several configurations to get a better backtrace and could reproduce that crash only in some rather specific cases:

  • using the existing 0.8.5+0 binaries: crashes
  • building libcxxwrap_julia manually (with my own gcc 9): works
  • building libcxxwrap_julia with binarybuilder (uses gcc 7): crashes
  • setting preferred_gcc_version = 8: works
  • setting preferred_gcc_version = 9: works
  • setting preferred_gcc_version = 7 + cmake target Debug: works
  • setting preferred_gcc_version = 7 + cmake target Debug + -O2: works
  • setting preferred_gcc_version = 7 + cmake target Debug + -O3: crashes
  • setting preferred_gcc_version = 7 + -O2: works

And with that -O3 Debug option I got a slightly better backtrace:

#0  0x00007fffda2768cf in jlcxx::ConvertToJulia<jlcxx::ConstArray<double, 1l>, jlcxx::ConstArrayTrait>::operator() (arr=..., this=<optimized out>)
    at /workspace/srcdir/libcxxwrap-julia/include/jlcxx/const_array.hpp:95
#1  jlcxx::convert_to_julia<jlcxx::ConstArray<double, 1l> > (cpp_val=...) at /workspace/srcdir/libcxxwrap-julia/include/jlcxx/type_conversion.hpp:745
#2  jlcxx::detail::ReturnTypeAdapter<jlcxx::ConstArray<double, 1l>>::operator()(void const*) (this=<optimized out>, functor=<optimized out>)
    at /workspace/srcdir/libcxxwrap-julia/include/jlcxx/module.hpp:47
#3  jlcxx::detail::CallFunctor<jlcxx::ConstArray<double, 1l>>::apply(void const*) (functor=<optimized out>) at /workspace/srcdir/libcxxwrap-julia/include/jlcxx/module.hpp:72

const_array.hpp:95 is a JL_GC_POP() call but that it only happens with -O3 points to something rather annoying to debug.

So far for today, maybe someone else wants to have a look again.

benlorenz avatar Dec 08 '20 12:12 benlorenz

@benlorenz thanks for that, that helps a lot. So perhaps we can just rebuild this with GCC 8 or 9. It may "just" be a compiler bug, after all.

fingolfin avatar Dec 09 '20 13:12 fingolfin

I made https://github.com/JuliaPackaging/Yggdrasil/pull/2236 let's see if that helps

fingolfin avatar Dec 09 '20 13:12 fingolfin

That indeed seems to have fixed it, great! Now I guess the CI in this repository should be switched to use GCC 8+, too?

fingolfin avatar Dec 09 '20 16:12 fingolfin

Alright, great news that it works with the newer GCC, but I'm still quite nervous that this is a bug in libcxxwrap-julia. We'll see if it resurfaces elsewhere somehow, for now this seems like a good solution, thanks for the help!

barche avatar Dec 09 '20 21:12 barche