Deadlock doing I/O on a foreign thread while the main thread is blocked
Reduced from https://github.com/JuliaGPU/CUDA.jl/issues/2449 filed by @miniskar:
- create a pthread on which we call back into Julia and
runa command - have the main thread block on
pthread_join - the foreign thread cannot make progress because of some lock being held
#include <julia.h>
#include <pthread.h>
typedef void (*julia_callback)();
void *thread_function(void* callback) {
printf("Calling Julia from thread\n");
((julia_callback)callback)();
return NULL;
}
void call_on_thread(julia_callback callback) {
printf("Creating thread\n");
pthread_t thread;
pthread_create(&thread, NULL, thread_function, callback);
pthread_join(thread, NULL);
}
// alternative version that doesn't use a foreign thread,
// and as a result doesn't deadlock
void call_directly(julia_callback callback) {
printf("Calling Julia directly\n");
callback();
}
function callback()::Cvoid
println("Running a command")
run(`echo 42`)
return
end
function main()
callback_ptr = @cfunction(callback, Cvoid, ())
gc_state = @ccall(jl_gc_safe_enter()::Int8)
ccall((:call_on_thread, "./wip.so"), Cvoid, (Ptr{Cvoid},), callback_ptr)
@ccall(jl_gc_safe_leave(gc_state::Int8)::Cvoid)
println("Done")
end
main()
❯ gcc -fPIC -shared -o wip.so wip.c -isystem $JULIA/include/julia -isystem /opt/cuda/include -L$JULIA/lib -ljulia -lpthread && \
julia --project wip.jl
Creating thread
Calling Julia from thread
Running a command^C
[41055] signal 2: Interrupt
in expression starting at /home/tim/Julia/pkg/CUDA/wip.jl:16
unknown function (ip: 0x7b4dd01ada17)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:822
ijl_task_get_next at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-11/src/scheduler.c:584
poptask at ./task.jl:1012
wait at ./task.jl:1021
uv_write at ./stream.jl:1072
unsafe_write at ./stream.jl:1145
write at ./strings/io.jl:248 [inlined]
print at ./strings/io.jl:250 [inlined]
print at ./strings/io.jl:46
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-11/src/julia.h:2156 [inlined]
do_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-11/src/builtins.c:831
println at ./strings/io.jl:75
println at ./coreio.jl:4
callback at /home/tim/Julia/pkg/CUDA/wip.jl:3
I wasn't even sure if this is guaranteed to work, but @vchuravy mentioned that marking the blocking ccall @gc_safe ought to be enough to not have Julia hold any locks when entering C, so filing this as an issue here.
cc @vtjnash
The deadlock is because of IO. The foreign thread is waiting for IO to happen while the thread that can run IO is blocked.
Ah, right, so https://github.com/JuliaLang/julia/pull/50880 would fix this?
That's what I´m experimenting on
I thought we had an issue specifically about this already, but I don't see the specific way of handling pthread_join mentioned in https://github.com/JuliaLang/julia/issues/47201 currently, only in the parent issue that spawned it
I don't think we have a way to detect this. Nobody is holding the io lock. The issue is that nobody is running the IO. The only way to not deadlock here is to have another thread run the IO. But if we deadlock here then it's too late.
Is this issue fixed in some branch?
Thank you Nash for the fix. I have verified this experimental fix for the MWE code and also with my application code using CUDA GPUs through pthreads as well. It is working good.
May I know the plan to release this fix to upcoming release?