Polyester.jl icon indicating copy to clipboard operation
Polyester.jl copied to clipboard

Julia freezes when only one thread throws an error in an `@batch` block

Open efaulhaber opened this issue 4 years ago • 5 comments

This one was a real Heisenbug. Unstable simulations, which crashed in serial execution, froze on multiple threads, but trying to let it freeze on purpose made the bug disappear.

If only one thread (except the first one!) throws an error, Julia freezes. MWE:

using Polyester

function foo()
    println("Before")

    @batch for i in 1:100
        if Threads.threadid() == 2
            error()
        end
    end

    println("After")
end

Running this on more than one thread causes Julia to freeze after showing Before. This can be easily interrupted with Ctrl+C, which then seems to solve the problem for the session, but this is only due to #30, because now only one thread will be used.

efaulhaber avatar Jun 29 '21 21:06 efaulhaber

By crashes in serial execution, do you mean throws an error? Normally I think of crashes as "julia exits".

The fix here is probably to have ThreadingUtilities.wait check if the task it waited on threw an error.

chriselrod avatar Jun 30 '21 03:06 chriselrod

I think the API will be that wait prints any errors, resets the tasks automatically, and then returns an error code. The caller can then decide what to do, e.g. whether to throw. In Polyester's case, it would throw after setting its own state appropriately.

This wouldn't solve the problem of someone interrupting the process manually, as then Polyester wouldn't get the chance to reset its own state. But it would stop the hangs, and allow multiple threads to still be used.

chriselrod avatar Jun 30 '21 03:06 chriselrod

By crashes in serial execution, do you mean throws an error?

Yes - like throwing a DomainError from sqrt or log.

ranocha avatar Jun 30 '21 04:06 ranocha