jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8342607: Enhance register printing on x86_64 platforms

Open TheRealMDoerr opened this issue 1 year ago • 16 comments

There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. I've implemented JDK-8342607 such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.)

Example output (linux):

Registers:
RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340
RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000
R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120
R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0
RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006
  TRAPNO=0x000000000000000e

XMM[0]=0x0000000000000000 0x0000000000000000
XMM[1]=0x00007fea3c034200 0x0000000000000000
XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0
XMM[3]=0x00007fea7c3d6608 0x0000000000000000
XMM[4]=0x00007f0000000000 0x0000000000000000
XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff
XMM[6]=0x0000000000000000 0x00007fea897d0f98
XMM[7]=0x0202020202020202 0x0000000000000000
XMM[8]=0x0000000000000000 0x0202020202020202
XMM[9]=0x666e69206e6f6974 0x0000000000000000
XMM[10]=0x0000000000000000 0x6e6f6974616d726f
XMM[11]=0x0000000000000001 0x0000000000000000
XMM[12]=0x00007fea8b684400 0x0000000000000001
XMM[13]=0x0000000000000000 0x0000000000000000
XMM[14]=0x0000000000000000 0x0000000000000000
XMM[15]=0x0000000000000000 0x0000000000000000
  MXCSR=0x0000037f

Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8342607: Enhance register printing on x86_64 platforms (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615
$ git checkout pull/21615

Update a local copy of the PR:
$ git checkout pull/21615
$ git pull https://git.openjdk.org/jdk.git pull/21615/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21615

View PR using the GUI difftool:
$ git pr show -t 21615

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21615.diff

Webrev

Link to Webrev Comment

TheRealMDoerr avatar Oct 21 '24 14:10 TheRealMDoerr

:wave: Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Oct 21 '24 14:10 bridgekeeper[bot]

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342607: Enhance register printing on x86_64 platforms

Co-authored-by: Richard Reingruber <[email protected]>
Reviewed-by: rrich, stuefe, mbaesken

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 114 new commits pushed to the master branch:

  • 0abfa3ba8f72538f62be838c1ebac8cfbdd14cdf: 8304824: NMT should not use ThreadCritical
  • 88dc655a6d6cfc71c467405b62bd35beeed7794f: 8342988: GHA: Build JTReg in single step
  • df3473e22069145334dd7323bfa793c237a7f26e: 8343178: Test BasicTest.java javac compile fails cannot find symbol
  • 54327bc4e38773b7461977ce17f2185c068bce9b: 8342962: [s390x] TestOSRLotsOfLocals.java crashes
  • f0075d593db657182e1857e54710a1052e9d1cf0: 8343115: SkipIfEqual class is not used after JDK-8335946
  • 90bd544512de541cd98889bec58f419bc69a723d: 8342958: Use jvmArgs consistently in microbenchmarks
  • d49f21043b84ebcc8b9176de3a84621ca7bca8fb: 8342040: Further improve entry lookup performance for multi-release JARs
  • d2e716eb72ea603fce50f0757a766ec623ef2faf: 8331958: Update PC/SC Lite for Suse Linux to 2.3.0
  • a95374f588149d80068275a496ba4aa04b3bb4fd: 8343101: Rework BasicTest.testTemp test cases
  • 00fe9f7bdfd245791bca6b5b1b2d0a98d41af221: 8343100: Consolidate EmptyFolderTest and EmptyFolderPackageTest jpackage tests into single java file
  • ... and 104 more: https://git.openjdk.org/jdk/compare/aa060f22d302789c4f80dd1ebaa233a97b6b0073...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Oct 21 '24 14:10 openjdk[bot]

@TheRealMDoerr The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Oct 21 '24 14:10 openjdk[bot]

/label add hotspot-compiler

TheRealMDoerr avatar Oct 21 '24 14:10 TheRealMDoerr

@TheRealMDoerr The hotspot-compiler label was successfully added.

openjdk[bot] avatar Oct 21 '24 14:10 openjdk[bot]

Looks good to me, but I could live well without that indentation of the MXCSR value.

MBaesken avatar Oct 22 '24 07:10 MBaesken

Hi Martin, is there documentation on which you base the implementation? My quick search wasn't very successful. How do we know that the printed values are correct?

reinrich avatar Oct 22 '24 09:10 reinrich

Hi Martin, is there documentation on which you base the implementation? My quick search wasn't very successful. How do we know that the printed values are correct?

You can find the data structures here: https://github.com/bminor/glibc/blob/dcad78507433a9a64b8b548b19e110933f8d939a/sysdeps/x86_64/sys/ucontext.h#L106

TheRealMDoerr avatar Oct 22 '24 09:10 TheRealMDoerr

MS documents also the CONTEXT structure with various register values, including XMM https://learn.microsoft.com/de-de/windows/win32/api/winnt/ns-winnt-context

MBaesken avatar Oct 22 '24 09:10 MBaesken

Well, with my quick-search I found information like that too. I don't see though, how it helps understand that the printed values are correct.

reinrich avatar Oct 22 '24 11:10 reinrich

Well, with my quick-search I found information like that too. I don't see though, how it helps understand that the printed values are correct.

I think the only way to verify that the printed values are correct is testing it. I'm currently experimenting with setting up XMM values and then crashing intentionally. I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange.

TheRealMDoerr avatar Oct 22 '24 19:10 TheRealMDoerr

Thanks Martin for doing this. I actually wanted to attempt writing a test myself before I read your message.

I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange.

Strange indeed.

reinrich avatar Oct 23 '24 07:10 reinrich

Well, with my quick-search I found information like that too. I don't see though, how it helps understand that the printed values are correct.

I think the only way to verify that the printed values are correct is testing it. I'm currently experimenting with setting up XMM values and then crashing intentionally. I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange.

I got curious, then confused, and looked. See https://stackoverflow.com/questions/43415882/reading-sse-registers-xmm-ymm-in-a-signal-handler - even though it refers to ymm registers. The answer seems to indicate that the glibc structure is not what you should be using, but going via uc_mcontext seems to be the right way.

tstuefe avatar Oct 23 '24 07:10 tstuefe

Thanks for your input @reinrich, @tstuefe. I've tried taking the values from uc->uc_mcontext.fpregs->_xmm[i]. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call MacroAssembler::debug64. Surprisingly, my old code based on uc->__fpregs_mem._xmm[i].element[0] works for them (besides the off-by-one issue), but the new proposal doesn't. Any idea?

TheRealMDoerr avatar Oct 23 '24 15:10 TheRealMDoerr

For the record: the kernel sets the fpregs pointer here :)

reinrich avatar Oct 23 '24 15:10 reinrich

Thanks for your input @reinrich, @tstuefe. I've tried taking the values from uc->uc_mcontext.fpregs->_xmm[i]. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call MacroAssembler::debug64.

odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp.

tstuefe avatar Oct 24 '24 09:10 tstuefe

Thanks for your input @reinrich, @tstuefe. I've tried taking the values from uc->uc_mcontext.fpregs->_xmm[i]. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call MacroAssembler::debug64.

odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp.

The ucontext_t is memcpyed. The fpregs pointer in the copy is probably invalid when used.

reinrich avatar Oct 24 '24 09:10 reinrich

Thanks for your input @reinrich, @tstuefe. I've tried taking the values from uc->uc_mcontext.fpregs->_xmm[i]. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call MacroAssembler::debug64.

odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp.

to extend: we trigger an artificial segfault in fatal with the poison page, and grab that context. Generally that works. Question is why not here.

tstuefe avatar Oct 24 '24 09:10 tstuefe

Thanks for your input @reinrich, @tstuefe. I've tried taking the values from uc->uc_mcontext.fpregs->_xmm[i]. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call MacroAssembler::debug64.

odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp.

The ucontext_t is memcpyed. The fpregs pointer in the copy is probably invalid when used.

oh good point! I remember dealing with a similar problem on PPC years ago and implementing a deep copy somehow.

tstuefe avatar Oct 24 '24 09:10 tstuefe

The context is copied in the signal handler and used here after the signal handler has returned. The copied fpregs pointer referes still to original which is/was located in the signal handler caller frame. You need to set it to the __fpregs_mem in the copy.

reinrich avatar Oct 24 '24 09:10 reinrich

The context is copied in the signal handler and used here after the signal handler has returned. The copied fpregs pointer referes still to original which is/was located in the signal handler caller frame. You need to set it to the __fpregs_mem in the copy.

Correction: the kernel has aligned fpregs so it doesn't point precisely to __fpregs_mem (that's why Martins version isn't working). The copied fpregs must point to the same offset in the ucontext_t copy as the original fpregs pointer in the original ucontext_t.

reinrich avatar Oct 24 '24 09:10 reinrich

Thanks for figuring this out! So, there's a bit more to repair :-) I'll take a look.

TheRealMDoerr avatar Oct 24 '24 10:10 TheRealMDoerr

It works when I repair the copied uc_mcontext (see 2nd commit). Please take a look. Note that I'm using little endian style: print the high order half first which is at offset 8, then the low order half (offset 0). We should also backport this fix.

TheRealMDoerr avatar Oct 24 '24 12:10 TheRealMDoerr

Also note that the copy fix above is kind of a hack. Should we use something like the following?

diff --git a/src/hotspot/share/utilities/debug.cpp b/src/hotspot/share/utilities/debug.cpp
index 988e5dddd90..1389654b4b6 100644
--- a/src/hotspot/share/utilities/debug.cpp
+++ b/src/hotspot/share/utilities/debug.cpp
@@ -731,7 +731,9 @@ static void store_context(const void* context) {
 #if defined(PPC64)
   *((void**) &g_stored_assertion_context.uc_mcontext.regs) = &(g_stored_assertion_context.uc_mcontext.gp_regs);
 #elif defined(AMD64)
-  *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = &(g_stored_assertion_context.uc_mcontext.fpregs);
+  // In the copied version, fpregs should point to the copied contents. Preserve the offset.
+  intptr_t offset = (address)(void*)(g_stored_assertion_context.uc_mcontext.fpregs) - (address)context;
+  *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = (void*)((address)(void*)&g_stored_assertion_context + offset);
 #endif
 #endif
 }

TheRealMDoerr avatar Oct 24 '24 13:10 TheRealMDoerr

Also note that the copy fix above is kind of a hack. Should we use something like the following?

diff --git a/src/hotspot/share/utilities/debug.cpp b/src/hotspot/share/utilities/debug.cpp
index 988e5dddd90..1389654b4b6 100644
--- a/src/hotspot/share/utilities/debug.cpp
+++ b/src/hotspot/share/utilities/debug.cpp
@@ -731,7 +731,9 @@ static void store_context(const void* context) {
 #if defined(PPC64)
   *((void**) &g_stored_assertion_context.uc_mcontext.regs) = &(g_stored_assertion_context.uc_mcontext.gp_regs);
 #elif defined(AMD64)
-  *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = &(g_stored_assertion_context.uc_mcontext.fpregs);
+  // In the copied version, fpregs should point to the copied contents. Preserve the offset.
+  intptr_t offset = (address)(void*)(g_stored_assertion_context.uc_mcontext.fpregs) - (address)context;
+  *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = (void*)((address)(void*)&g_stored_assertion_context + offset);
 #endif
 #endif
 }

Looks like what I ment in my comment above. I was wondering why you ignored it.

reinrich avatar Oct 24 '24 13:10 reinrich

Looks like what I ment in my comment above. I was wondering why you ignored it.

Sorry. I had forgotten it after looking into another issue. Committed slightly improved version.

TheRealMDoerr avatar Oct 24 '24 16:10 TheRealMDoerr

/contributor add reinrich

TheRealMDoerr avatar Oct 24 '24 16:10 TheRealMDoerr

@TheRealMDoerr reinrich was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <[email protected]>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

openjdk[bot] avatar Oct 24 '24 16:10 openjdk[bot]

/contributor add @reinrich

TheRealMDoerr avatar Oct 24 '24 17:10 TheRealMDoerr