dynamorio
dynamorio copied to clipboard
[perf] remove global lock from signal pending path
The thread_initexit_lock is a contention hog for signal handling:
Thread 685 (Thread 0x7f0ca95b4700 (LWP 95391)):
#0 0x00007f0cabc7a47e in syscall_ready ()
#1 0x00007f0cc4842280 in thread_in_DR_exclusion ()
#2 0x00007f0cabd17584 in ksynch_wait (futex=futex@entry=0x6, mustbe=mustbe@entry=1) at dynamorio/trunk/core/unix/ksynch_linux.c:120
#3 0x00007f0cabd24fed in mutex_wait_contended_lock (lock=0x6) at dynamorio/trunk/core/unix/os.c:8882
#4 0x00007f0cabd2f5e6 in mutex_lock (lock=<optimized out>) at dynamorio/trunk/core/utils.c:888
#5 0x00007f0cabd29415 in translate_sigcontext (dcontext=dcontext@entry=0x7f0c44962ac0, uc=uc@entry=0x7f0c44ca6ac0, avoid_failure=<optimized out>, f=f@entry=0x7f0c44c8d7e8)
at dynamorio/trunk/core/unix/signal.c:2179
#6 0x00007f0cabd2a1db in record_pending_signal (dcontext=dcontext@entry=0x7f0c44962ac0, sig=sig@entry=27, ucxt=ucxt@entry=0x7f0c44ca6ac0, frame=frame@entry=0x7f0c44ca6ab8,
forged=forged@entry=0 '\000', access_address=0x0) at dynamorio/trunk/core/unix/signal.c:3650
#7 0x00007f0cabd2ba54 in master_signal_handler_C (sig=<optimized out>, siginfo=<optimized out>, ucxt=0x7f0c44ca6ac0, xsp=0x7f0c44ca6ab8 "\024\240ǫ\f\177")
at dynamorio/trunk/core/unix/signal.c:4526
One option is to try to use enter_couldbelinking instead of the thread_initexit_lock. However, it can end up flushing the interrupted fragment from underneath us. I think I can get that to work w/o a client, but it messes up the pre-xl8 cxt for clients.
Another question is why this alarm signal here was being delivered now, as this was with -code_api where there are no inlined syscalls.
Another source of lock contention in delivering signals (lumping into this same issue) is all_memory_areas->lock when copying the frame to the stack:
#2 0x000055ddcc109af0 in memquery_iterator_next () <-- memory_info_buf_lock
#3 0x000055ddcc109d10 in memquery_from_os ()
#4 0x000055ddcc114dc7 in get_memory_info_from_os ()
#5 0x000055ddcc1090aa in memcache_query_memory () <-- all_memory_areas->lock
#6 0x000055ddcc111b2f in get_memory_info ()
#7 0x000055ddcc11f53f in copy_frame_to_stack ()
#8 0x000055ddcc11dd1a in receive_pending_signal ()
#9 0x000055ddcc0cbd52 in dispatch ()
Did PR #2872 fix this issue? From a quick reading it seems there's still more work left for reducing lock contention.
I have a local branch for this which I renamed long ago to i2114-signal-lock-not-sure-worthwhile. My local commit message says:
i#2114: reduce signal pending lock contention
NOCHECKIN: unclear we need it; has risks; maybe use global lock for cases
where can't fail, local o/w?
NOCHECKIN: client pre-xl8 is main issue
For pending signals arriving at syscalls, we translate immediately, using
the global thread_initexit_lock. This can be a point of contention for
timer signals. We swap to using enter_couldbelinking instead.
Fixes #2114