JVM hangs at safepoint synchronization when profiling some large applications
Sometimes I get the following thread states:
- profiler thread blocks at safepoint when executing
jvmti->GetLineNumberTableand holdingframe_lock; - one user thread in signal handling holds
in_scope_lockand spins on acquiringframe_lock; - some other user threads in signal thread spin on acquiring
in_scope_lock.
According to gdb, user threads were interrupted while being already blocked at safepoint, and these threads are unable to block once more. Running application with -XX:+SafepointTimeout and other related flags agrees with gdb and reports that the same user threads which spin in signal handler cannot reach safepoint.
Tightening critical section in profiler code so that it enters section only to work with static_call_frames fixes this problem, but not entirely I guess -- this just harshly reduces the chance of its occurrence.
It's odd that we're stuck at jvmti->GetLineNumberTable, but I'm pretty sure (as you said on gitter), we don't need to be holding frame_lock past the for loop at https://github.com/Decave/JCoz/blob/master/src/native/profiler.cc#L385. frame_lock really just needs to protect static_call_frames, which is used to collect call frames in the user threads when an experiment isn't running.