JuliaInterpreter.jl icon indicating copy to clipboard operation
JuliaInterpreter.jl copied to clipboard

Segfault with IO locking

Open timholy opened this issue 6 years ago • 10 comments

On recent Julia versions (I'm using 1.4.0-DEV.77), I fear that anything calling iolock_begin/iolock_end may be problematic. Demo:

julia> using JuliaInterpreter

julia> repl = Base.active_repl;

julia> @interpret displaysize(repl.t.out_stream)

signal (11): Segmentation fault
in expression starting at REPL[3]:1
jl_eh_restore_state at /home/tim/src/julia-master/src/rtutils.c:254
step_expr! at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:55
step_expr! at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:556
unknown function (ip: 0x7fd83fa88771)
finish! at /home/tim/.julia/dev/JuliaInterpreter/src/commands.jl:14
finish_and_return! at /home/tim/.julia/dev/JuliaInterpreter/src/commands.jl:29 [inlined]
#evaluate_call_recurse!#46 at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:239
evaluate_call_recurse! at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:202 [inlined]
eval_rhs at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:378
step_expr! at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:517
step_expr! at /home/tim/.julia/dev/JuliaInterpreter/src/interpret.jl:556
unknown function (ip: 0x7fd83fa88771)
finish! at /home/tim/.julia/dev/JuliaInterpreter/src/commands.jl:14
finish_and_return! at /home/tim/.julia/dev/JuliaInterpreter/src/commands.jl:29 [inlined]
finish_and_return! at /home/tim/.julia/dev/JuliaInterpreter/src/commands.jl:33 [inlined]
finish_and_return! at /home/tim/.julia/dev/JuliaInterpreter/src/commands.jl:33
_jl_invoke at /home/tim/src/julia-master/src/gf.c:2136 [inlined]
jl_apply_generic at /home/tim/src/julia-master/src/gf.c:2300
jl_apply at /home/tim/src/julia-master/src/julia.h:1632 [inlined]
do_call at /home/tim/src/julia-master/src/interpreter.c:328
eval_value at /home/tim/src/julia-master/src/interpreter.c:417
eval_body at /home/tim/src/julia-master/src/interpreter.c:641
jl_interpret_toplevel_thunk_callback at /home/tim/src/julia-master/src/interpreter.c:888
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7fd84d39ef0f)
unknown function (ip: 0x15)
jl_interpret_toplevel_thunk at /home/tim/src/julia-master/src/interpreter.c:897
jl_toplevel_eval_flex at /home/tim/src/julia-master/src/toplevel.c:814
jl_toplevel_eval_flex at /home/tim/src/julia-master/src/toplevel.c:764
jl_toplevel_eval_in at /home/tim/src/julia-master/src/toplevel.c:843
eval at ./boot.jl:330
_jl_invoke at /home/tim/src/julia-master/src/gf.c:2130 [inlined]
jl_apply_generic at /home/tim/src/julia-master/src/gf.c:2300
eval_user_input at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
macro expansion at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:333
_jl_invoke at /home/tim/src/julia-master/src/gf.c:2130 [inlined]
jl_apply_generic at /home/tim/src/julia-master/src/gf.c:2300
jl_apply at /home/tim/src/julia-master/src/julia.h:1632 [inlined]
start_task at /home/tim/src/julia-master/src/task.c:655
unknown function (ip: 0xffffffffffffffff)
Allocations: 5311781 (Pool: 5310655; Big: 1126); GC: 6
/home/tim/bin/julia-master: line 2:  9357 Segmentation fault      (core dumped) ~/src/julia-master/julia "$@"

I find 40 calls to iolock_begin in base/ and 48 in stdlib/, which means this would be a lot of methods to add to compiled_methods. Any ideas about alternatives?

timholy avatar Aug 31 '19 09:08 timholy

First picked up in https://github.com/timholy/Revise.jl/issues/344

timholy avatar Aug 31 '19 09:08 timholy

Smells like a task-switching problem. The relevant lines of rtutils.c are

    arraylist_t *locks = &current_task->locks;
    if (locks->len > eh->locks_len) {
        for (size_t i = locks->len;i > eh->locks_len;i--)
            jl_mutex_unlock_nogc((jl_mutex_t*)locks->items[i - 1]);
        locks->len = eh->locks_len;
    }

So if the current_task is the interpreter, and that's not the same task as that used for IO, then perhaps locks and eh are not the pair that's expected from these lines.

Any ideas, @JeffBezanson or @vtjnash? I don't really know much about IO or the scheduler, can we somehow grab an IO task and switch to it before calling iolock_begin/iolock_end?

timholy avatar Aug 31 '19 10:08 timholy

I find 40 calls to iolock_begin in base/ and 48 in stdlib/, which means this would be a lot of methods to add to compiled_methods. Any ideas about alternatives?

True, but I guess it is better than segfaulting... If we don't want to add all these explicitly to compiled_methods we could have it as a step to optimize and make it Compiled(). Not satisfactory, but again, better than having peoples segfault when trying to debug their package code.

KristofferC avatar Sep 03 '19 16:09 KristofferC

Hm, eh is a task-local variable there. But this lock (and any of the others in that list—but you'll only see the others from C and in Core.Compiler) doesn't really support task switches and is expected to be syntactically ordered with respect to any try/catch blocks.

vtjnash avatar Sep 03 '19 16:09 vtjnash

Do we know why iolock_begin doesn't work? It's just a ccall.

JeffBezanson avatar Sep 03 '19 16:09 JeffBezanson

Hmm, the C interpreter seems to handle it:

julia> Base.iolock_begin()
       println("hello")
       Base.iolock_end()
hello

timholy avatar Sep 03 '19 16:09 timholy

Ah:

julia> try Base.iolock_begin() catch end
       try println("hello") catch end
       try Base.iolock_end() catch end
hello

signal (11): Segmentation fault
in expression starting at REPL[2]:3
jl_eh_restore_state at /home/tim/src/julia-master/src/rtutils.c:254
eval_body at /home/tim/src/julia-master/src/interpreter.c:725
eval_body at /home/tim/src/julia-master/src/interpreter.c:705
jl_interpret_toplevel_thunk_callback at /home/tim/src/julia-master/src/interpreter.c:888
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f3a768e300f)
unknown function (ip: 0x2)
jl_interpret_toplevel_thunk at /home/tim/src/julia-master/src/interpreter.c:897
jl_toplevel_eval_flex at /home/tim/src/julia-master/src/toplevel.c:814
jl_toplevel_eval_flex at /home/tim/src/julia-master/src/toplevel.c:764
jl_toplevel_eval_in at /home/tim/src/julia-master/src/toplevel.c:843
eval at ./boot.jl:330
_jl_invoke at /home/tim/src/julia-master/src/gf.c:2130 [inlined]
jl_apply_generic at /home/tim/src/julia-master/src/gf.c:2300
eval_user_input at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
run_backend at /home/tim/.julia/dev/Revise/src/Revise.jl:972
#79 at ./task.jl:333
_jl_invoke at /home/tim/src/julia-master/src/gf.c:2136 [inlined]
jl_apply_generic at /home/tim/src/julia-master/src/gf.c:2300
jl_apply at /home/tim/src/julia-master/src/julia.h:1632 [inlined]
start_task at /home/tim/src/julia-master/src/task.c:655
unknown function (ip: 0xffffffffffffffff)
Allocations: 3383828 (Pool: 3382975; Big: 853); GC: 2
/home/tim/bin/julia-master: line 2:  1246 Segmentation fault      (core dumped) ~/src/julia-master/julia "$@"

Good "catch" @vtjnash! We put each line inside a try/catch here.

timholy avatar Sep 03 '19 16:09 timholy

Maybe we should only auto-release locks when an error is thrown past a frame?

JeffBezanson avatar Sep 03 '19 16:09 JeffBezanson

In the meantime I've submitted #335. Anyone wanting to test this after that merges can just replace the hasarg(isequal(:iolock_begin), code.code) with false.

timholy avatar Sep 08 '19 11:09 timholy

Easier reproducer:

@interpret Base.PipeEndpoint()

timholy avatar Sep 08 '19 11:09 timholy