IOCapture.jl icon indicating copy to clipboard operation
IOCapture.jl copied to clipboard

Hang when captureing output from MathLink

Open musoke opened this issue 1 year ago • 7 comments

I have run into an issue similar to that fixed in https://github.com/JuliaDocs/IOCapture.jl/pull/15 and reported in https://github.com/JuliaDocs/Documenter.jl/issues/1947

IOCapture hangs the first time it encounters a W_cmd macro as defined in MathLink.

Here is a minimal Project.toml. IOCapture 0.2.3 includes #15.

[deps]
IOCapture = "b5f81e59-6552-4d32-b1f0-c071b021bf89"
MathLink = "18c93696-a329-5786-9845-8443133fa0b4"

[compat]
IOCapture = "0.2.3"
MathLink = "0.5.1"
julia = "1.9.2"

Here is a demonstration in the REPL. I used ctrl-C to kill the second command after it got stuck. You can see the backtrace. Running the same command a second time works.

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.2 (2023-07-05)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using IOCapture, MathLink

julia> c = IOCapture.capture() do
           W`Sin`
       end
^CERROR: InterruptException:
Stacktrace:
  [1] poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
    @ Base ./task.jl:974
  [2] wait()
    @ Base ./task.jl:983
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock}; first::Bool)
    @ Base ./condition.jl:130
  [4] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
    @ Base ./condition.jl:125
  [5] _wait(t::Task)
    @ Base ./task.jl:308
  [6] wait
    @ ./task.jl:347 [inlined]
  [7] (::IOCapture.var"#3#5"{DataType, var"#3#4", Task, IOContext{Base.PipeEndpoint}, IOContext{Base.PipeEndpoint}, Base.TTY, Base.TTY})()
    @ IOCapture ~/.julia/packages/IOCapture/8Uj7o/src/IOCapture.jl:130
  [8] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:514
  [9] with_logger
    @ ./logging.jl:626 [inlined]
 [10] capture(f::var"#3#4"; rethrow::Type, color::Bool)
    @ IOCapture ~/.julia/packages/IOCapture/8Uj7o/src/IOCapture.jl:116
 [11] capture(f::Function)
    @ IOCapture ~/.julia/packages/IOCapture/8Uj7o/src/IOCapture.jl:72
 [12] top-level scope
    @ REPL[2]:1

julia> c = IOCapture.capture() do
           W`Sin`
       end
(value = W"Sin", output = "", error = false, backtrace = Ptr{Nothing}[])

julia>

Using the W command once before using within a IOCapture context works as expected.

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.2 (2023-07-05)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using IOCapture, MathLink
[ Info: Precompiling IOCapture [b5f81e59-6552-4d32-b1f0-c071b021bf89]

julia> W`Sin`
W"Sin"

julia> c = IOCapture.capture() do
           W`Sin`
       end
(value = W"Sin", output = "", error = false, backtrace = Ptr{Nothing}[])

julia>

This may be because MathLink needs to establish a connection with Mathematica during initialisation?

This issue may be tricky to reproduce without a Mathematica or Wolfram Engine installation.

musoke avatar Jul 12 '23 16:07 musoke

A couple of quick debugging thoughts:

  • What happens if you try to capture() a _defaultlink() call? As you say, it seems to try to establish a connection first, so maybe this helps to reduce this.
  • Does it also hang if you manually do redirect_stdout / redirect_stderr? Basically, it would be good to try to reduce the capture() implementation down, to see which part there might be causing the hang.
  • Are there any Julia version differences? Does it also hang on nightly or 1.6?

mortenpi avatar Jul 12 '23 23:07 mortenpi

  • Does it also hang if you manually do redirect_stdout / redirect_stderr? Basically, it would be good to try to reduce the capture() implementation down, to see which part there might be causing the hang.

I have made some progress with this line of investigation. Here is a script that reduces IOCapture.capture's implementation to approximately the minimum required to reproduce the hang.

using IOCapture
using Logging
using MathLink

function hangs()
    @info "Before capture"
    c = IOCapture.capture() do
        W`Sin`
    end
    @info "After capture" c
end

function hangs_mwe()
    @debug "Before capture"
    c = iocapture_mwe() do
        W`Sin`
    end
    @debug "After capture" c
end

function iocapture_mwe(f)
    @debug "Entered"
    default_stdout = stdout
    # default_stderr = stderr

    pipe = Pipe()
    Base.link_pipe!(pipe; reader_supports_async = true, writer_supports_async = true)

    pe_stdout = pipe.in
    # pe_stderr = pipe.in

    redirect_stdout(pe_stdout)
    # redirect_stderr(pe_stderr)

    output = IOBuffer()
    buffer_redirect_task = @async write(output, pipe)

    yield()
    @show value = f()

    output = String(take!(output)),
    @info "captured output" output


    redirect_stdout(default_stdout)
    # redirect_stderr(default_stderr)

    close(pe_stdout)
    # close(pe_stderr)

    wait(buffer_redirect_task)  # No hang if this line is commented

    (value, output)
end

ENV["JULIA_DEBUG"] = Main

hangs_mwe()

The problem seems to be connected to wait(buffer_redirect_task) - the hang goes away if that line is commented.

  • What happens if you try to capture() a _defaultlink() call? As you say, it seems to try to establish a connection first, so maybe this helps to reduce this.

What is _defaultlink()?

  • Are there any Julia version differences? Does it also hang on nightly or 1.6?

I tried a few different juliaup channels and see the same behaviour each of 1.6, 1.7, 1.8, 1.9, and alpha (currently 1.10.0-alpha+x0.64.linux.gnu).

musoke avatar Jul 18 '23 18:07 musoke

_defaultlink: https://github.com/JuliaInterop/MathLink.jl/blob/305115610de2d6ea1f4a64e0df458b9282c03b08/src/link.jl#L88-L103

It seems like this gets sometimes called to establish the default connection as you said?

https://github.com/JuliaInterop/MathLink.jl/blob/305115610de2d6ea1f4a64e0df458b9282c03b08/src/eval.jl#L52-L59

Although I didn't immediately see how it relates to the W... macros right now.

mortenpi avatar Jul 19 '23 00:07 mortenpi

I think you're right.

The W... string literals are defined by W_str and W_cmd. The latter contains a call to weval, which in turn has the _defaultlink call you pointed out. https://github.com/JuliaInterop/MathLink.jl/blob/305115610de2d6ea1f4a64e0df458b9282c03b08/src/eval.jl#L87

musoke avatar Jul 19 '23 11:07 musoke

If you just do capture() around a _defaultlink call, does that hang too though?

mortenpi avatar Jul 20 '23 04:07 mortenpi

Yes, that hangs too.

I have modified capture and _defaultlink to capture only stdout and added some debug statements. Here's a repo with those versions pinned and the following script: https://github.com/musoke/mathlink-iocpture-hang

using IOCapture
using Logging
using MathLink

function hangs()
    @debug "Before capture"
    c = IOCapture.capture() do
        MathLink._defaultlink()
    end
    @debug "After capture" c
end

ENV["JULIA_DEBUG"] = "IOCapture,MathLink"

# weval("Sin[0]")  # no hang if uncommented
hangs()

The result of running it is

┌ Debug: Redirected outputs, about to call f
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:114
┌ Debug: yielded, about to call f
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:120
┌ Debug: defaultlink uninitialised
│   defaultlink = MathLink.Link(Ptr{Nothing} @0x0000000000000000)
└ @ MathLink ~/.julia/packages/MathLink/oqu5t/src/link.jl:90
┌ Debug: _defaultlink return value
│   defaultlink = MathLink.Link(Ptr{Nothing} @0x0000000001579db0)
└ @ MathLink ~/.julia/packages/MathLink/oqu5t/src/link.jl:107
┌ Debug: result of calling f
│   f = #3 (generic function with 1 method)
│   f_val = MathLink.Link(Ptr{Nothing} @0x0000000001579db0)
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:122
┌ Debug: ended redirects, closed pipes
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:134

At which point it halts indefinitely.

If seems that the call to f (_defaultlink in this case) at https://github.com/JuliaDocs/IOCapture.jl/blob/3fc23ea5ba2f82e35df0bf9d4766968c13aa5e39/src/IOCapture.jl#L121-L123 returns.

However, capture gets stuck at https://github.com/JuliaDocs/IOCapture.jl/blob/3fc23ea5ba2f82e35df0bf9d4766968c13aa5e39/src/IOCapture.jl#L136

Adding a weval("Sin[0]") before the capture prevents it from hanging and gives this output:

┌ Debug: defaultlink uninitialised
│   defaultlink = MathLink.Link(Ptr{Nothing} @0x0000000000000000)
└ @ MathLink ~/.julia/packages/MathLink/oqu5t/src/link.jl:90
┌ Debug: _defaultlink return value
│   defaultlink = MathLink.Link(Ptr{Nothing} @0x00000000015eee70)
└ @ MathLink ~/.julia/packages/MathLink/oqu5t/src/link.jl:107
┌ Debug: Redirected outputs, about to call f
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:114
┌ Debug: yielded, about to call f
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:120
┌ Debug: defaultlink already initialised
│   defaultlink = MathLink.Link(Ptr{Nothing} @0x00000000015eee70)
└ @ MathLink ~/.julia/packages/MathLink/oqu5t/src/link.jl:104
┌ Debug: _defaultlink return value
│   defaultlink = MathLink.Link(Ptr{Nothing} @0x00000000015eee70)
└ @ MathLink ~/.julia/packages/MathLink/oqu5t/src/link.jl:107
┌ Debug: result of calling f
│   f = #3 (generic function with 1 method)
│   f_val = MathLink.Link(Ptr{Nothing} @0x00000000015eee70)
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:122
┌ Debug: ended redirects, closed pipes
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:134
┌ Debug: waited on buffer, end finally
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:137
┌ Debug: return values
│   value = MathLink.Link(Ptr{Nothing} @0x00000000015eee70)
└ @ IOCapture ~/.julia/packages/IOCapture/dYgMT/src/IOCapture.jl:140

musoke avatar Jul 20 '23 15:07 musoke

TLDR: MathLink doesn't let pipe.out close after initialising so https://github.com/JuliaDocs/IOCapture.jl/blob/a23f2a4e0b48a7170c040a5f2a82abc5ba0757bb/src/IOCapture.jl#L130 hangs. I don't know if this is a bug here or MathLink. I can submit a PR here to add close(pipe.out) but am not sure if that is correct.


Here is a script that demonstrates. It passes two functions through a very pared down version of IOCapture.capture: one is a vanilla function and the other uses MathLink.

using MathLink

function boring()
    1 + 2
end

function mathlink()
    weval("Sin[0]")
end

function hang(f)
    pipe = Pipe()
    @debug "Pipe created" f = string(f) pipe pipe.in pipe.out

    default_stdout = stdout
    Base.link_pipe!(pipe; reader_supports_async = true, writer_supports_async = true)
    pe_stdout = pipe.in
    redirect_stdout(pe_stdout)

    output = IOBuffer()
    buffer_redirect_task = @async write(output, pipe)

    val = f()
    @debug "Value computed" f = string(f) val
    @debug "Pipe in use" f = string(f) pipe pipe.in pipe.out

    redirect_stdout(default_stdout)
    close(pe_stdout)

    if timedwait(() -> istaskdone(buffer_redirect_task), 30) === :ok
        @info "wait succeeded" f = string(f) pipe pipe.in pipe.out
    else
        @warn "wait timed out after 30 seconds" f = string(f) pipe pipe.in pipe.out
        close(pipe.out)
        wait(buffer_redirect_task)
        @info "wait succeeded after closing pipe.out" f = string(f) pipe pipe.in pipe.out
    end

    @debug "Contents of redirected stdout" String(take!(output))
end

ENV["JULIA_DEBUG"] = Main

hang(boring)
print("\n\n")
hang(mathlink)

One can see in the output below that waiting on buffer_redirect_task succeeds for the vanilla function. It does not for the MathLink function - until pipe.out is closed explicitly.

┌ Debug: Pipe created
│   f = "boring"
│   pipe = Pipe(RawFD(4294967295) init => RawFD(4294967295) init, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(4294967295) init, 0 bytes waiting)
│   pipe.out = Base.PipeEndpoint(RawFD(4294967295) init, 0 bytes waiting)
└ @ Main ~/tmp/IOCapture-hang/buf.jl:13
┌ Debug: Value computed
│   f = "boring"
│   val = 3
└ @ Main ~/tmp/IOCapture-hang/buf.jl:24
┌ Debug: Pipe in use
│   f = "boring"
│   pipe = Pipe(RawFD(21) open => RawFD(20) active, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(21) open, 0 bytes waiting)
│   pipe.out = Base.PipeEndpoint(RawFD(20) active, 0 bytes waiting)
└ @ Main ~/tmp/IOCapture-hang/buf.jl:25
┌ Info: wait succeeded
│   f = "boring"
│   pipe = Pipe(RawFD(4294967295) closed => RawFD(4294967295) closed, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(4294967295) closed, 0 bytes waiting)
└   pipe.out = Base.PipeEndpoint(RawFD(4294967295) closed, 0 bytes waiting)
┌ Debug: Contents of redirected stdout
│   String(take!(output)) = ""
└ @ Main ~/tmp/IOCapture-hang/buf.jl:39


┌ Debug: Pipe created
│   f = "mathlink"
│   pipe = Pipe(RawFD(4294967295) init => RawFD(4294967295) init, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(4294967295) init, 0 bytes waiting)
│   pipe.out = Base.PipeEndpoint(RawFD(4294967295) init, 0 bytes waiting)
└ @ Main ~/tmp/IOCapture-hang/buf.jl:13
┌ Debug: Value computed
│   f = "mathlink"
│   val = "Sin[0]"
└ @ Main ~/tmp/IOCapture-hang/buf.jl:24
┌ Debug: Pipe in use
│   f = "mathlink"
│   pipe = Pipe(RawFD(21) open => RawFD(20) active, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(21) open, 0 bytes waiting)
│   pipe.out = Base.PipeEndpoint(RawFD(20) active, 0 bytes waiting)
└ @ Main ~/tmp/IOCapture-hang/buf.jl:25
┌ Warning: wait timed out after 30 seconds
│   f = "mathlink"
│   pipe = Pipe(RawFD(4294967295) closed => RawFD(20) active, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(4294967295) closed, 0 bytes waiting)
│   pipe.out = Base.PipeEndpoint(RawFD(20) active, 0 bytes waiting)
└ @ Main ~/tmp/IOCapture-hang/buf.jl:33
┌ Info: wait succeeded after closing pipe.out
│   f = "mathlink"
│   pipe = Pipe(RawFD(4294967295) closed => RawFD(4294967295) closed, 0 bytes waiting)
│   pipe.in = Base.PipeEndpoint(RawFD(4294967295) closed, 0 bytes waiting)
└   pipe.out = Base.PipeEndpoint(RawFD(4294967295) closed, 0 bytes waiting)
┌ Debug: Contents of redirected stdout
│   String(take!(output)) = ""
└ @ Main ~/tmp/IOCapture-hang/buf.jl:39

musoke avatar Jul 20 '23 19:07 musoke