dynamorio
dynamorio copied to clipboard
SPEC jbb'15 [jdk8] sharedRuntime.cpp:549 failed: safepoint polling: pc must refer to an nmethod
after -ignore_assert_list '*' i was able to get past the DR_WHERE_DISPATCH assert point.
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/dispatch.c:757 wherewasi == DR_WHERE_FCACHE || wherewasi == DR_WHERE_TRAMPOLINE || wherewasi == DR_WHERE_APP || (dcontext->go_native && wherewasi == DR_WHERE_DISPATCH)>
app execution fails at (sharedRuntime.cpp:549), pid=11981, tid=0x00007f04b2ef6700 guarantee(cb != NULL && cb->is_nmethod()) failed: safepoint polling: pc must refer to an nmethod. Attaching thread log file.
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/utils.c:1672 curiosity : size >= 0 && size < sizeof(logbuf)>
<Application /root/rahul/workloads/JVM/jdk1.8.0_201/bin/java (11981). Application exception at PC 0x00007f04b4221428.
Signal 6 delivered to application as default action.
root@fmx190:/workloads/SPECjbb2015/jbb102/scripts# /root/rahul/tools/DynamoRIO-x86_64-Linux-7.91.18173-0/bin64/drrun -s 3000 -debug -loglevel 2 -vm_size 1G -no_enable_reset -disable_traces -ignore_assert_list '*' -- /workloads/JVM/jdk1.8.0_201/bin/java -jar /workloads/SPECjbb2015/jbb102/specjbb2015.jar -m COMPOSITE
<log dir=/root/rahul/tools/DynamoRIO-x86_64-Linux-7.91.18173-0/bin64/../logs/java.11981.00000000>
<Starting application /root/rahul/workloads/JVM/jdk1.8.0_201/bin/java (11981)>
<Initial options = -no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -disable_traces -no_enable_traces -max_elide_jmp 0 -max_elide_call 0 -no_shared_traces -bb_ibl_targets -no_shared_trace_ibl_routine -no_enable_reset -no_reset_at_switch_to_os_at_vmm_limit -reset_at_vmm_percent_free_limit 0 -no_reset_at_vmm_full -reset_at_commit_free_limit 0K -reset_every_nth_pending 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct -ignore_assert_list '*' >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/root/rahul/tools/DynamoRIO-x86_64-Linux-7.91.18173-0/lib64/debug/libdynamorio.so' 0x00007f0534e6ab68
>
<(1+x) Handling our fault in a TRY at 0x00007f05350c5556>
<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps)
os says: 0x00007f04aadf6000-0x00007f04b2df6000 prot=0x00000000
cache says: 0x00007f04aadf6000-0x00007f04b2df7000 prot=0x00000000
>
<writing to executable region.>
<curiosity: rex.w on OPSZ_6_irex10_short4!>
SPECjbb2015 Java Business Benchmark
(c) Standard Performance Evaluation Corporation, 2015
<failed to translate>
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:2609 false>
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/translate.c:1370 in_fcache(pc)>
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/translate.c:1374 curiosity : res && "Unable to translate pc">
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:2613 sc->SC_XIP != (ptr_uint_t)NULL>
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (sharedRuntime.cpp:549), pid=11981, tid=0x00007f04b2ef6700
# guarantee(cb != NULL && cb->is_nmethod()) failed: safepoint polling: pc must refer to an nmethod
#
# JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /root/rahul/workloads/SPECjbb2015/jbb102/scripts/core or core.11981
#
# An error report file with more information is saved as:
# /root/rahul/workloads/SPECjbb2015/jbb102/scripts/hs_err_pid11981.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
<Ignoring assert /home/travis/build/DynamoRIO/dynamorio/core/utils.c:1672 curiosity : size >= 0 && size < sizeof(logbuf)>
<Application /root/rahul/workloads/JVM/jdk1.8.0_201/bin/java (11981). Application exception at PC 0x00007f04b4221428.
Signal 6 delivered to application as default action.
Callstack:
0x00007f04b4221428 </lib/x86_64-linux-gnu/libc-2.23.so+0x35428>
0x00007f04b3cd3803 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0xad3803>
0x00007f04b36e0368 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x4e0368>
0x00007f04b3bc071f </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x9c071f>
0x00007f04b3b190fc </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x9190fc>
0x00007f04b3b0b8b8 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x90b8b8>
0x00007f04b49e4390 </lib/x86_64-linux-gnu/libpthread-2.23.so+0x11390>
0x00007f049d008100
0x00007f049d008100
0x00007f049d0083b6
0x00007f049d008100
0x00007f049d008100
0x00007f049d008100
0x00007f049d0007a7
0x00007f04b388825b </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x68825b>
0x00007f04b39004b4 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x7004b4>
0x00007f049d018847
0x00007f049d008100
0x00007f049d008100
0x00007f049d008100
0x00007f049d008100
0x00007f049d0007a7
0x00007f04b388825b </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x68825b>
0x00007f04b3885b23 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x685b23>
0x00007f04b3886143 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0x686143>
0x00007f04b3c44ed4 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0xa44ed4>
0x00007f04b3c4389b </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0xa4389b>
0x00007f04b3c45779 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0xa45779>
0x00007f04b3caf121 </root/rahul/workloads/JVM/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so+0xaa<Stopping application /root/rahul/workloads/JVM/jdk1.8.0_201/bin/java (11981)>
i'm unable to extract more info from -debug -loglevel 2. trying level 3 now but it runs pretty slow.
added thread log file log.1.11982.zip
does DR support later version of jdks?
Guess: I think it's more likely that DR supports older versions of the JVM than newer ones, given that at some point in the past Java worked under DynamoRIO.
We don't have any continuous integration set up that runs Java afaik, and we don't have any active developers who have a desire to maintain Java support at this moment unfortunately. I will try to look through the log file to see what I can find, but I am not sure I will be able to make much progress here.
do you recommend a specific version of older JVM.
No, I'm sorry. I do not personally have any experience running Java under DynamoRIO. There may be resources online about people who have tried in the past.
You also might try sending a mail to dynamorio-users: there are more people cc'd to that list than are cc'd to issues on the github, and someone there might be able to help you. I recommend referencing this issue in that message if you do send one.
do you recommend a specific version of older JVM.
Hi I'm trying to compile SPECJBB with dynamorio,JDK1.7 can run on x86 server,but cannot run on Arm。Have you tried successfully on the Arm server? THX!
I don't know of anyone who has tried SPECJBB on ARM. @AssadHashmi may know some?
As @Carrotman42 noted above, there are many more people on the mailing list than monitor the issue tracker.
Since there seems to be a group of people interested in Java, perhaps you could collaborate to help resolve the issue(s), and then add some kind of Java-targeted regression testing.
Thank you very much for your reply!From my research, Dynamorio is indispensable for analyzing my java scenes, so I hope to solve this problem very much. I tried to run a simple helloWorld.jar, the process is as follows,and the version of JDK is JDK1.8 on Arm.
$ gdb --args ./bin64/drrun -debug -- java -jar helloWorld.jar
$ r
$ set confirm off
$ add-symbol-file '/.../libdynamorio.so' 0x0000000071013da0
Program received signal SIGBUS,Bus error`
$ handle SIGBUS nostop pass
Thread 1 "java" received signal SIGUSR2, User defined signal 2.
0x000000004e55f744 in ?? ()
$ b master_signal_handler
(gdb) b master_signal_handler
Breakpoint 1 at 0x71343f88: master_signal_handler. (2 locations)
$ layout src
(gdb) c
Continuing.
Thread 1 "java" hit Breakpoint 1, master_signal_handler () at
/.../dynamorio/core/arch/aarch64/aarch64.asm:603
(gdb)
$ n //push n until assert, here is the assert: at dynamorio/core/unix/signal.c line : 5193
case SUSPEND_SIGNAL:
if (handle_suspend_signal(dcontext, ucxt, frame)) {
ASSERT(tr == NULL || tr->under_dynamo_control || IS_CLIENT_THREAD(dcontext));
record_pending_signal(dcontext, sig, ucxt, frame, false _IF_CLIENT(NULL));
}
break;
$ p tr // $1 = (thread_record_t *) 0x4e48bf48
$ p tr->under_dynamo_control //True
Here is all the jumps in the file:dynamorio/core/unix/signal.c:
4759: {
4760: sigframe_rt_t *frame = (sigframe_rt_t *)xsp;
4767: sigcontext_t *sc = SIGCXT_FROM_UCXT(ucxt);
4770: uint level = 2;
4805: dcontext_t *dcontext = get_thread_private_dcontext();
4822: if (dcontext == NULL && (sig == SIGSEGV || sig == SIGBUS) &&
//Not enter the if condition
4828: if (dynamo_exited && get_num_threads() > 1 && sig == SIGSEGV)
//Not enter the if condition
4839: if (sig == SUSPEND_SIGNAL) {
4840: if (proc_get_vendor() == VENDOR_AMD) //Not enter
4864: if (dcontext == NULL && // Not enter
4885: if (dcontext == NULL || //Not enter the whole if contidion
4929: ENTERING_DR();
4930: if (dcontext == GLOBAL_DCONTEXT) { //Not enter the if condition
local = false;
tr = thread_lookup(get_sys_thread_id());
} else {
4934: tr = dcontext->thread_record; //So enter the else condition
4935: local = local_heap_protected(dcontext);
if (local) //Not enter this if condition
SELF_PROTECT_LOCAL(dcontext, WRITABLE);
}
4943: ASSERT(tr == NULL || tr->under_dynamo_control || IS_CLIENT_THREAD(dcontext) ||
//Not assert
4946: LOG(THREAD, LOG_ASYNCH, level,
4949: LOG(THREAD, LOG_ASYNCH, level + 1,
4953: DOLOG(level + 1, LOG_ASYNCH, { dump_sigcontext(dcontext, sc); });
4992: switch (sig) {
5192: case SUSPEND_SIGNAL:
5193: if (handle_suspend_signal(dcontext, ucxt, frame)) {
Then the program stopped.Then I entered this function: handle_suspend_signal,here is the process:
7190: handle_suspend_signal(dcontext_t *dcontext, kernel_ucontext_t *ucxt, sigframe_rt_t *frame) {
7191: {
7192: os_thread_data_t *ostd = (os_thread_data_t *)dcontext->os_field;
7195: ASSERT(ostd != NULL);
7197: if (ostd->terminate) {
7217: if (!doing_detach && is_thread_currently_native(dcontext->thread_record) &&
7236: if (ostd->suspend_count == 0)
7237: ASSERT(ostd->suspended_sigcxt == NULL);
7246: dr_where_am_i_t prior_whereami = dcontext->whereami;
7247: dcontext->whereami = DR_WHERE_SIGNAL_HANDLER;
7249: sig_full_initialize(&sc_full, ucxt);
7250: ostd->suspended_sigcxt = &sc_full;
7252: LOG(THREAD, LOG_ASYNCH, 2, "handle_suspend_signal: suspended now\n");
7262: ASSERT(ksynch_get_value(&ostd->suspended) == 0);
7263: ksynch_set_value(&ostd->suspended, 1);
7264: ksynch_wake_all(&ostd->suspended);
7274: sigprocmask_syscall(SIG_SETMASK, SIGMASK_FROM_UCXT(ucxt), &prevmask,
7275: sizeof(ucxt->uc_sigmask));
7278: while (ksynch_get_value(&ostd->wakeup) == 0) {
7282: ksynch_wait(&ostd->wakeup, 0, 0);
Then the process stopped here. I try to describe the situation I encountered, I don't know if this information is valid, I am looking forward to your reply.Thank you very much!
Hi,professor. Do I need to provide more information, such as logs? I have a debug version of jvm. Do you know how I can see which command triggered SIGUSR2? I am looking forward to your reply.Thank you very much!
I
SIGUSR2 is used internally by DR to synchronize between threads. It is normal and does not indicate that anything is wrong. It is used in certain types of code cache flushes, e.g.
Ok I see,thank you very much,but when I set : handle SIGUSR2 nostop pass , the program received signal SIGSEGV,and then crashed,so I want to know that which function I can set a break point to find out the reason of the crash,or could you please suggest some ways to solve this problem?Or what should I do to add some kind of Java-targeted regression testing?
when I run a simple helloworld.jar on Arm64 (jdk1.8):
[buildDebug]$ ./bin64/drrun -debug -loglevel 2 -- java -jar helloWorld.jar
<log dir=/.../logs/java.126256.00000000>
<Starting application java (126256)>
<Initial options = -no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/.../dynamorio/buildDebug/lib64/debug/libdynamorio.so' 0x0000000071013da0
>
<(1+x) Handling our fault in a TRY at 0x0000000071343e6c>
<Application java (126256). DynamoRIO internal crash at PC 0x000000004c59b170. Please report this at http://dynamorio.org/issues/. Program aborted.
Received SIGSEGV at generated pc 0x000000004c59b170 in thread 126256
Base: 0x0000000071000000
Registers: eflags=0x0000000080000000
version 7.0.18139, custom build
-no_dynamic_options -loglevel 2 -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x0000ffffdb4b87d0 0x0000ffffa1e15904
0x0000ffffdb4b8850 0x0000ffffa1e12690
0x0000ffffdb4b88f0 0x0000ffffa1e12a34
0x0000ffffdb4b89a0 0x0000000000400658
0x0000ffffdb4bcae0 0x0000ffffa1c84ae0
0x0000ffffdb4bcaf0 0x0000000000400698>
I ignore SIGUSR2 then set a break point at master_signal_handle,the process is here:
master_signal_handler_C(byte *xsp)
{
sigframe_rt_t *frame = (sigframe_rt_t *)xsp;
sigcontext_t *sc = SIGCXT_FROM_UCXT(ucxt);
uint level = 2;
dcontext_t *dcontext = get_thread_private_dcontext();
ENTERING_DR();
if (dcontext == GLOBAL_DCONTEXT) {
} else {
tr = dcontext->thread_record;
local = local_heap_protected(dcontext);
}
switch (sig) {
case SIGSEGV: {
void *pc = (void *)sc->SC_XIP;
bool syscall_signal = false; /* signal came from syscall? */
bool is_write = false;
bool is_DR_exception = false;
target = compute_memory_target(dcontext, pc, ucxt, siginfo, &is_write);
else if (!safe_is_in_fcache(dcontext, pc, (byte *)sc->SC_XSP) && (in_generated_routine(dcontext, pc) ||is_at_do_syscall(dcontext, pc, (byte *)sc->SC_XSP) || is_dynamo_address(pc))) {
is_DR_exception = true;
}
if (is_DR_exception) {
if (!syscall_signal) {
if (check_in_last_thread_vm_area(dcontext, target)) {
/* CRASH HERE */
}
check_in_last_thread_vm_area(dcontext_t *dcontext, app_pc pc)
{
thread_data_t *data = NULL;
bool in_last = false;
if (is_readable_without_exception((app_pc)&dcontext->vm_areas_field, 4))
/* CRASH HERE */
}
is_readable_without_exception(const byte *pc, size_t size)
{
bool query_os = IF_MEMQUERY_ELSE(true, !DYNAMO_OPTION(use_all_memory_areas));
return is_readable_without_exception_internal(pc, size, query_os);
Then gdb can't debug, but java is still running.
when I set : handle SIGUSR2 nostop pass , the program received signal SIGSEGV,and then crashed,so I want to know that which function I can set a break point to find out the reason of the crash,or could you please suggest some ways to solve this problem?
If you have handle SIGSEGV stop it will stop there -- just continue if it's a safe-read fault (sometimes one or two early on). Also, if DR is using page faults for cache consistency, it may also be a handled SIGSEGV.
Probably the problem is somewhere in cache consistency. It may require a combination of logging (-loglevel 4) and the debugger. Also try tweaking how DR performs consistency: e.g., -no_hw_cache_consistency or -no_sandbox_writes. Also xref the branch that improves consistency but is not integrated into DR yet (need volunteers to help): https://github.com/DynamoRIO/dynamorio/wiki/JIT-Optimization
Or what should I do to add some kind of Java-targeted regression testing?
There are several avenues here:
- Add unit tests. We have some tests today of self-modifying code and other types of code modification. If we find some pattern of code modification is not handled properly and that's the cause of this bug, we would want to add a unit test of that pattern to DR's existing test suite.
- Add app-level tests of actual Java. This is more complex since we have to figure out how to get Java. Probably we would require it to be on the system, since we don't really want to check a JVM into our repo. Then we would have Travis + Appveyor install it. Or, there could be a custom setup on an outside machine and use CDash or Jenkins to be triggered.
Or, there could be a custom setup on an outside machine and use CDash or Jenkins to be triggered.
I may be able to help with this on the AArch64 cloud instance we use for Arm nightly testing and CI.
I have encoutered the same problem and is anybody paying attention to the problem?
I have encoutered the same problem and is anybody paying attention to the problem?
Have you tried the approach @derekbruening suggested https://github.com/DynamoRIO/dynamorio/issues/3892#issuecomment-549904008 ? Specifically:
If you have
handle SIGSEGV stopit will stop there -- just continue if it's a safe-read fault (sometimes one or two early on).
Probably the problem is somewhere in cache consistency. It may require a combination of logging (-loglevel 4) and the debugger. Also try tweaking how DR performs consistency: e.g., -no_hw_cache_consistency or -no_sandbox_writes.
have you tried the no_hw_cache_consistency and -no_sandbox_writes options?
Xref #3733