julia
julia copied to clipboard
Add `jl_print_task_backtraces()`
Iterates through jl_all_tls_states
and through all live_tasks
in ptls->heap
, printing backtraces.
The purpose is to help find deadlocks.
Replaces https://github.com/JuliaLang/julia/pull/44990. Closes #46177.
For this error:
/cache/build/default-amdci5-6/julialang/julia-master/src/stackwalk.c:1124:9: error: Passing non-rooted value as argument to function that may GC [julia.GCChecker]
--
| jlbacktracet(ptls2->root_task);
| ^ ~~~~~~~~~~~~~~~~
Do I have to JL_GC_PUSH1(&ptls2->root_task)
and JL_GC_POP()
after?
Also, I don't understand this?
/cache/build/default-amdci5-6/julialang/julia-master/src/stackwalk.c:1119:54: note: Argument value was derived global with untracked type. You may want to update the checker's type list
--
| jl_safe_printf(" ---- Root task (%p)\n", ptls2->root_task);
| ^~~~~~~~~~~~~~~~
We're experimenting with this to see if it's useful. As Nathan said, this is basically for use in gdb
, i.e. all threads will be stopped, so poking into other threads' local storage is safe.
I find that I cannot get a backtrace (Linux glibc x86-64) with the default JL_HAVE_ASM
and have to turn on JL_HAVE_UNW_CONTEXT
. Does https://github.com/JuliaLang/julia/pull/45110 fix that @vtjnash?
This should be useful eventually for figuring out why the Sockets test stales on CI, such as https://buildkite.com/julialang/julia-master/builds/16200#01838101-13bb-48fd-8aff-7705c619cd66
I find that I cannot get a backtrace (Linux glibc x86-64) with the default
JL_HAVE_ASM
and have to turn onJL_HAVE_UNW_CONTEXT
. Does #45110 fix that @vtjnash?
Oh, interesting! I wonder if that's why I originally was thinking this didn't work. Thanks for tracking all of this down, @kpamnany!
@vtjnash / @kpamnany: is this resolved? Is there anything I/we can do to help move this along? Thanks! 😊
I added a comment warning that this is only intended for use when all threads are stopped (i.e. in gdb
). I also removed it from exported functions for that reason.
We've verified that this can be useful. So, apart from the GC checker errors, I think this is good to go.
This is failing analyzegc on master. Not sure why it didn't fail on this PR:
/cache/build/default-amdci5-5/julialang/julia-master/src/stackwalk.c:1131:28: error: Implicit Atomic seq_cst synchronization [concurrency-implicit-atomics,-warnings-as-errors]
--
| for (size_t i = 0; i < jl_n_threads; i++) {
| ^
| /cache/build/default-amdci5-5/julialang/julia-master/src/stackwalk.c:1132:27: error: Implicit Atomic seq_cst synchronization [concurrency-implicit-atomics,-warnings-as-errors]
| jl_ptls_t ptls2 = jl_all_tls_states[i];
| ^
This PR was reverted in #47182 because of the problem with analyzegc reported above.