julia icon indicating copy to clipboard operation
julia copied to clipboard

Add `jl_print_task_backtraces()`

Open kpamnany opened this issue 2 years ago • 2 comments

Iterates through jl_all_tls_states and through all live_tasks in ptls->heap, printing backtraces.

The purpose is to help find deadlocks.

Replaces https://github.com/JuliaLang/julia/pull/44990. Closes #46177.

kpamnany avatar Sep 20 '22 17:09 kpamnany

For this error:

/cache/build/default-amdci5-6/julialang/julia-master/src/stackwalk.c:1124:9: error: Passing non-rooted value as argument to function that may GC [julia.GCChecker]
--
  | jlbacktracet(ptls2->root_task);
  | ^            ~~~~~~~~~~~~~~~~

Do I have to JL_GC_PUSH1(&ptls2->root_task) and JL_GC_POP() after?

kpamnany avatar Sep 20 '22 20:09 kpamnany

Also, I don't understand this?

/cache/build/default-amdci5-6/julialang/julia-master/src/stackwalk.c:1119:54: note: Argument value was derived global with untracked type. You may want to update the checker's type list
--
  | jl_safe_printf("     ---- Root task (%p)\n", ptls2->root_task);
  | ^~~~~~~~~~~~~~~~

kpamnany avatar Sep 20 '22 20:09 kpamnany

We're experimenting with this to see if it's useful. As Nathan said, this is basically for use in gdb, i.e. all threads will be stopped, so poking into other threads' local storage is safe.

I find that I cannot get a backtrace (Linux glibc x86-64) with the default JL_HAVE_ASM and have to turn on JL_HAVE_UNW_CONTEXT. Does https://github.com/JuliaLang/julia/pull/45110 fix that @vtjnash?

kpamnany avatar Sep 26 '22 21:09 kpamnany

This should be useful eventually for figuring out why the Sockets test stales on CI, such as https://buildkite.com/julialang/julia-master/builds/16200#01838101-13bb-48fd-8aff-7705c619cd66

vtjnash avatar Sep 28 '22 08:09 vtjnash

I find that I cannot get a backtrace (Linux glibc x86-64) with the default JL_HAVE_ASM and have to turn on JL_HAVE_UNW_CONTEXT. Does #45110 fix that @vtjnash?

Oh, interesting! I wonder if that's why I originally was thinking this didn't work. Thanks for tracking all of this down, @kpamnany!

@vtjnash / @kpamnany: is this resolved? Is there anything I/we can do to help move this along? Thanks! 😊

NHDaly avatar Oct 03 '22 20:10 NHDaly

I added a comment warning that this is only intended for use when all threads are stopped (i.e. in gdb). I also removed it from exported functions for that reason.

We've verified that this can be useful. So, apart from the GC checker errors, I think this is good to go.

kpamnany avatar Oct 04 '22 22:10 kpamnany

This is failing analyzegc on master. Not sure why it didn't fail on this PR:

/cache/build/default-amdci5-5/julialang/julia-master/src/stackwalk.c:1131:28: error: Implicit Atomic seq_cst synchronization [concurrency-implicit-atomics,-warnings-as-errors]
--
  | for (size_t i = 0; i < jl_n_threads; i++) {
  | ^
  | /cache/build/default-amdci5-5/julialang/julia-master/src/stackwalk.c:1132:27: error: Implicit Atomic seq_cst synchronization [concurrency-implicit-atomics,-warnings-as-errors]
  | jl_ptls_t ptls2 = jl_all_tls_states[i];
  | ^

Keno avatar Oct 15 '22 23:10 Keno

This PR was reverted in #47182 because of the problem with analyzegc reported above.

giordano avatar Oct 16 '22 17:10 giordano