Polyester.jl
Polyester.jl copied to clipboard
Julia freezes when only one thread throws an error in an `@batch` block
This one was a real Heisenbug. Unstable simulations, which crashed in serial execution, froze on multiple threads, but trying to let it freeze on purpose made the bug disappear.
If only one thread (except the first one!) throws an error, Julia freezes. MWE:
using Polyester
function foo()
println("Before")
@batch for i in 1:100
if Threads.threadid() == 2
error()
end
end
println("After")
end
Running this on more than one thread causes Julia to freeze after showing Before. This can be easily interrupted with Ctrl+C, which then seems to solve the problem for the session, but this is only due to #30, because now only one thread will be used.
By crashes in serial execution, do you mean throws an error? Normally I think of crashes as "julia exits".
The fix here is probably to have ThreadingUtilities.wait check if the task it waited on threw an error.
I think the API will be that wait prints any errors, resets the tasks automatically, and then returns an error code.
The caller can then decide what to do, e.g. whether to throw. In Polyester's case, it would throw after setting its own state appropriately.
This wouldn't solve the problem of someone interrupting the process manually, as then Polyester wouldn't get the chance to reset its own state. But it would stop the hangs, and allow multiple threads to still be used.
By crashes in serial execution, do you mean throws an error?
Yes - like throwing a DomainError from sqrt or log.