openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

Add SIGUSR2 handler and matching -Xdump event

Open kgibm opened this issue 3 years ago • 13 comments
trafficstars

Fixes #15636

Signed-off-by: Kevin Grigorenko [email protected]

kgibm avatar Jul 27 '22 20:07 kgibm

@pshipton @keithc-ca FYI

Notes:

  • SIGUSR1 seems to be used internally but I couldn't find any use of SIGUSR2
  • Mimicked sigquit.c
  • Tentatively chose "sigusr" as the -Xdump event (tried user2 but numerals not accepted)
  • No-op on Windows
  • Unclear if this event needs to be part of the SigQuit thread processing
  • Removed the call to TRIGGER_J9HOOK_VM_USER_INTERRUPT in sigUsr2Handler
  • Not sharing exclusive access to resolve issue #9256

Before the patch, the process exits:

$ printf "class Hang { public static void main(String... args) throws Throwable { Object o = new Object(); synchronized (o) { o.wait(); } } }" > Hang.java
$ javac Hang.java 
$ java Hang &
$ kill -USR2 %1
[1]+  User defined signal 2: 31 java Hang

After the patch, javacore is produced and process continues running:

$ java Hang &
$ kill -USR2 %1
JVMDUMP039I Processing dump event "sigusr", detail "" at 2022/07/27 15:01:51 - please wait.
JVMDUMP032I JVM requested Java dump using '/Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/javacore.20220727.150151.71696.0001.txt' in response to an event
JVMDUMP010I Java dump written to /Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/javacore.20220727.150151.71696.0001.txt
JVMDUMP013I Processed dump event "sigusr", detail "".

Customizing the sigusr event also works:

$ java -Xdump:system:events=sigusr,request=exclusive+prepwalk Hang &
$ kill -USR2 %1
JVMDUMP039I Processing dump event "sigusr", detail "" at 2022/07/27 15:03:21 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/core.20220727.150321.71712.0001.dmp' in response to an event
JVMDUMP010I System dump written to /Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/core.20220727.150321.71712.0001.dmp
JVMDUMP032I JVM requested Java dump using '/Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/javacore.20220727.150321.71712.0002.txt' in response to an event
JVMDUMP010I Java dump written to /Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/javacore.20220727.150321.71712.0002.txt
JVMDUMP013I Processed dump event "sigusr", detail "".

kgibm avatar Jul 27 '22 20:07 kgibm

We need a documentation issue created for this.

pshipton avatar Jul 28 '22 14:07 pshipton

Unclear if this event needs to be part of the SigQuit thread processing

Looks like yes it does

sigusr may be confusing. Some other ideas: usertwo, altuser, or look into why numbers aren't accepted and fix that if possible.

pshipton avatar Jul 28 '22 14:07 pshipton

We need a documentation issue created for this.

Sure, I can do that.

Unclear if this event needs to be part of the SigQuit thread processing

Looks like yes it does

Ok, I'll add that in.

sigusr may be confusing. Some other ideas: usertwo, altuser, or look into why numbers aren't accepted and fix that if possible.

Sure, I don't have a strong opinion on the event name. @keithc-ca any opinion?

kgibm avatar Jul 28 '22 14:07 kgibm

There are still several places that declare/use things related to SIGUSR2 that are not conditional on the enabling flag.

I think this should be an opt-in feature: a user must explicitly request handling SIGUSR2 via -Xdump:java:... options so there isn't a conflict with existing uses of that signal.

@keithc-ca Makes sense. I'll remove the default change. I didn't fully understand the eventMask fields in rasDumpSpecs in dmpagent.c - are those changing defaults or specifying what events can drive those agents?

I'm out of the office until next week but the rest of the comments make sense and I'll update then.

kgibm avatar Jul 28 '22 14:07 kgibm

I'm not sure why you had trouble using user2 as the event name; I don't see anything that should object to digits, it just needs to match the entry in dmpagent.c: rasDumpEvents.

keithc-ca avatar Jul 28 '22 15:07 keithc-ca

I meant to say in my previous comment that "user2" is my preference for the new event name.

keithc-ca avatar Jul 28 '22 15:07 keithc-ca

I thought of why user2 wasn't being parsed: It was complaining about unresolved tokens starting at 2, so it was resolving the user signal and then the 2 was left over, so I'll just need to place the user2 definition above the user definition.

kgibm avatar Jul 29 '22 13:07 kgibm

@keithc-ca @pshipton Updated and squashed based on feedback.

$ java -Xdump:java:events=user2,request=exclusive+prepwalk Hang &
[1] 31864
$ kill -USR2 %1
JVMDUMP039I Processing dump event "user2", detail "" at 2022/08/01 10:02:40 - please wait.
JVMDUMP032I JVM requested Java dump using '/Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/javacore.20220801.100240.31864.0001.txt' in response to an event
JVMDUMP010I Java dump written to /Users/kevin/git/openj9-openjdk-jdk8/build/macosx-x86_64-normal-server-release/images/j2sdk-image/javacore.20220801.100240.31864.0001.txt
JVMDUMP013I Processed dump event "user2", detail "".

There is still a minor default change in that, previously, SIGUSR2 would cause the process to exit:

$ java Hang &
[1] 32102
$ kill -USR2 %1
[1]+  User defined signal 2: 31 java Hang

Now, even if no -Xdump event is registered, the process no longer exits:

$ java Hang &
[1] 31858
$ kill -USR2 %1
$ 

kgibm avatar Aug 01 '22 15:08 kgibm

jenkins compile win jdk8

pshipton avatar Aug 03 '22 15:08 pshipton

I thought of why user2 wasn't being parsed: It was complaining about unresolved tokens starting at 2, so it was resolving the user signal and then the 2 was left over, so I'll just need to place the user2 definition above the user definition.

Yuck! I think a comment is warranted in that list (I was going to suggest they be ordered alphabetically).

keithc-ca avatar Aug 03 '22 18:08 keithc-ca

jenkins compile win jdk8

kgibm avatar Aug 04 '22 03:08 kgibm

@keithc-ca Feedback processed, please re-review

kgibm avatar Aug 04 '22 03:08 kgibm

jenkins test sanity win,win32 jdk8

keithc-ca avatar Aug 15 '22 15:08 keithc-ca

jenkins test sanity osx,zlinux jdk17

keithc-ca avatar Aug 15 '22 15:08 keithc-ca

The still running PR testing is https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/2544/

pshipton avatar Aug 15 '22 20:08 pshipton

The still running PR testing is https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/2544/

I was aware, just getting ready.

However, I would like to see this squashed after that testing is complete.

keithc-ca avatar Aug 15 '22 20:08 keithc-ca

For the record, test builds are:

  • https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/2544/
  • https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/2545/

keithc-ca avatar Aug 15 '22 20:08 keithc-ca

Tests passed; squashed.

kgibm avatar Aug 15 '22 20:08 kgibm