openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

OpenJ9 After loading javaagent 58 times, the agentmain method will not be called again

Open mazhimazh opened this issue 2 years ago • 9 comments

When i load JavaAgent(OpenJ9+JDK8+Linux 86_64) 58 times,the agentmain method will not be called again.I use below code to load:

 VirtualMachine vm = null;
        try {
            vm = VirtualMachine.attach(pid);
            vm.loadAgent(localPath, "test");
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (vm != null) {
                try {
                    vm.detach();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }

i don't know why?

mazhimazh avatar Sep 09 '22 03:09 mazhimazh

@tajila

pshipton avatar Sep 09 '22 10:09 pshipton

Do you have -version output from the JDK being used?

pshipton avatar Sep 09 '22 11:09 pshipton

Do you have -version output from the JDK being used? java -version output:

openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-b08) Eclipse OpenJ9 VM (build openj9-0.24.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20210120_930 (JIT enabled, AOT enabled) OpenJ9 - 345e1b09e OMR - 741e94ea8 JCL - ab07c6a8fd based on jdk8u282-b08)

mazhimazh avatar Sep 12 '22 02:09 mazhimazh

That build is very old. Does the problem still occur on a more recent build?

pshipton avatar Sep 12 '22 13:09 pshipton

That build is very old. Does the problem still occur on a more recent build?

I tried the following two versions, but the problem still exists and is exactly the same

openjdk version "11.0.10" 2021-01-19 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.24.0, JRE 11 Linux amd64-64-Bit Compressed References 20210120_910 (JIT enabled, AOT enabled) OpenJ9 - 345e1b09e OMR - 741e94ea8 JCL - 0a86953833 based on jdk-11.0.10+9)

openjdk version "17.0.4.1" 2022-08-12 IBM Semeru Runtime Open Edition 17.0.4.1 (build 17.0.4.1+1) Eclipse OpenJ9 VM 17.0.4.1 (build openj9-0.33.1, JRE 17 Linux amd64-64-Bit Compressed References 20220812_266 (JIT enabled, AOT enabled) OpenJ9 - 1d9d16830 OMR - b58aa2708 JCL - 1f4d354e654 based on jdk-17.0.4.1+1)

mazhimazh avatar Sep 13 '22 01:09 mazhimazh

Is the JDK you are attaching to using the newer build?

pshipton avatar Sep 13 '22 01:09 pshipton

@JasonFengJ9 fyi since the test case uses attach.

pshipton avatar Sep 13 '22 01:09 pshipton

Update:

Reproduced the issue reported, apparently the agent library was created/loaded for each VirtualMachine.loadAgent() request, eventually exhausted TLS key maximum 124 [1], the number of loading javaagent times might vary depending on other application usages. ~Need fix findAgentLibrary()to find previously loaded agent library~. Edit: it was jvmtiHookGetEnv invoking allocateEnvironment() during each loadAgent(), TLS cleanup only can occur when the thread destroyed or JVMTI shutdown (not this use case).

[1] https://github.com/eclipse-openj9/openj9-omr/blob/38e24339e79f9959b6ab8996f6b2ce10ed437141/include_core/omrthread_generated.h#L37

JasonFengJ9 avatar Sep 15 '22 12:09 JasonFengJ9

Update with testcase:

Steps to reproduce the issue:

  1. Run the target VM at one terminal - $JAVA_HOME/bin/java -cp testagent.zip openj9.issues.pr15853.TargetLoop, this just runs to keep target VM alive, Agent agentmain invoked.. appears if an agent is loaded successfully;
  2. Run $JAVA_HOME/bin/jcmd at another terminal to find VM ID from step 1, a sample output - 99111 openj9.issues.pr15853.TargetLoop;
  3. Run the attacher VM - $JAVA_HOME/bin/java -cp testagent.zip:$JAVA_HOME/lib/tools.jar openj9.issues.pr15853.OpenJ9Attacher testagent.zip 99111, this repeats VirtualMachine.attach()/loadAgent()/detach().

After less than ~120 time Agent agentmain invoked.. output, it doesn't appears even the attacher VM still requests loadAgent().

Current investigation shows that TLS is allocated via omrthread_tls_alloc() as following stacktrace:

  * frame #0: 0x0000000002289b82 libj9jvmti29.dylib`allocateEnvironment [inlined] jvmtiTLSAlloc(vm=<unavailable>, handle=0x0000000004877c68) at jvmtiHelpers.c:1828:9 [opt]
    frame #1: 0x0000000002289b82 libj9jvmti29.dylib`allocateEnvironment(invocationJavaVM=<unavailable>, version=<unavailable>, penv=0x000070000bc40e08) at jvmtiHelpers.c:425 [opt]
    frame #2: 0x000000000228e482 libj9jvmti29.dylib`jvmtiHookGetEnv(hook=<unavailable>, eventNum=<unavailable>, eventData=0x000070000bc40db0, userData=0x0000600003d0c3e0) at jvmtiHook.c:1690:16 [opt]
    frame #3: 0x000000000020ba0b libj9hookable29.dylib`J9HookDispatch(hookInterface=0x0000000002814f30, taggedEventNum=<unavailable>, eventData=0x000070000bc40db0) at hookable.cpp:235:5 [opt]
    frame #4: 0x0000000002342e23 libj9vm29.dylib`GetEnv(jvm=0x0000000002812c20, penv=0x000070000bc40e08, version=805372160) at jniinv.c:897:2 [opt]
    frame #5: 0x000000003d10a319 libinstrument.dylib`createNewJPLISAgent(vm=0x0000000002812c20, agent_ptr=0x000070000bc40e68) at JPLISAgent.c:217:16 [opt]
    frame #6: 0x000000003d109d85 libinstrument.dylib`Agent_OnAttach(vm=0x0000000002812c20, args="/Users/ivm/jars/comm/testagent.zip=test", reserved=<unavailable>) at InvocationAdapter.c:329:17 [opt]
    frame #7: 0x000000000229b601 libj9jvmti29.dylib`issueAgentOnLoadAttach(vm=<unavailable>, agentLibrary=0x0000000004876668, options="/Users/ivm/jars/comm/testagent.zip=test", loadFunctionName=<unavailable>, foundLoadFn=<unavailable>) at jvmtiStartup.c:409:7 [opt]
    frame #8: 0x000000000229aca2 libj9jvmti29.dylib`loadAgentLibraryOnAttach(vm=<unavailable>, library="instrument", options=<unavailable>, decorate=1) at jvmtiStartup.c:616:8 [opt]
    frame #9: 0x0000000002398048 libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(J9VMThread*) at BytecodeInterpreter.hpp:4761:15 [opt]
    frame #10: 0x0000000002397f4e libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(this=0x000070000bc413a0, vmThread=<unavailable>) at BytecodeInterpreter.hpp:10641 [opt]

However this TLS entry is not associated to a vmThread until invoking a specific API like https://github.com/eclipse-openj9/openj9/blob/ef6948124a16e50c03faa7e69383cb9677bc985d/runtime/jvmti/jvmtiHelpers.c#L1555-L1571

Hence this TLS entry won't be cleaned up because jvmtiTLSGet(vmThread, threadObject, j9env->tlsKey) returns NULL https://github.com/eclipse-openj9/openj9/blob/ef6948124a16e50c03faa7e69383cb9677bc985d/runtime/jvmti/jvmtiHook.c#L218-L230

In this testcase, no jvmtiTLSSet() invoked for allocated TLS and eventually run out total 124 entries. Some error handling needed for better failure messages as well.

FYI @gacholio

testagent.zip

JasonFengJ9 avatar Sep 15 '22 17:09 JasonFengJ9

@dnakamura Please take a look

tajila avatar Jan 20 '23 15:01 tajila

This slipped under my radar, I had something in progress but was distracted by other workloads. Devin pls let me know if you need a hand.

JasonFengJ9 avatar Jan 20 '23 16:01 JasonFengJ9

I may be misunderstanding, but I don't think having the tls entry unset is actually relevant. Every time an agent is loaded it creates a JPLISAgent, which in turn creates a JVMTI environment, and allocates a TLS key. I'm not 100% clear of the semantics of the JVMTI, but it appears that the agent will live until the VM is destroyed, and as such we will not release the tls key until then. Given that I'm seeing this potentially as expected behavior, although we should probably emit some kind of error when we fail to load a new agent

dnakamura avatar Feb 02 '23 17:02 dnakamura

My thought was to replace following jvmtiTLSSet() https://github.com/eclipse-openj9/openj9/blob/cf3c1ee0d3b38d3f7af41e1e51e4c5f7ca23f862/runtime/jvmti/jvmtiHook.c#L230 with jvmtiTLSFree(vmThread->javaVM, j9env->tlsKey)

~It fixed the issue reported~ (Edit: this note is not correct), iirc it breaks a test (can't recall the test name).

JasonFengJ9 avatar Feb 02 '23 17:02 JasonFengJ9

Looking a bit deeper I'm not sure that replacing the jvmtiTLSSet with a jvmtiTLSFree would work properly. Doing a free would free the key for all threads, even though there may be other threads which are still using the key. Also, unless I'm mistaken this the code in question would only be called when a thread is destroyed and would not be involved in this issue.

I think the solution would be to see if its feasable for multiple JPLISAgents loaded into the same vm could share a jvmti environment

dnakamura avatar Feb 21 '23 18:02 dnakamura

Looking into things more sharing jvmti environments is not feasible since they carry state with them, so I see 2 options going forward.

  1. delay allocating the TLS key until we actually need to store thread specific state. This is the simplest option to implement, and would mean that the test case passes, however the issue would still appear if you load a large number of agents which actually need to store thread specific state.

  2. the JVMTI code could be re-written to keep the thread specific data in a hashmap rather than TLS storage. This the more involved option, but does mean that you no longer have a limit on the number of agents that can be loaded.

dnakamura avatar Mar 18 '23 19:03 dnakamura

@tajila which approach do you think makes sense here?

dnakamura avatar Mar 29 '23 18:03 dnakamura

the JVMTI code could be re-written to keep the thread specific data in a hashmap rather than TLS storage. This the more involved option, but does mean that you no longer have a limit on the number of agents that can be loaded.

@babsingh I recall we explored this approach for virtual threads, what was the outcome?

tajila avatar Mar 29 '23 18:03 tajila

For JDK19+, we implemented lazy allocation of TLS array and thread-data in https://github.com/eclipse-openj9/openj9/pull/15541. Correction: This does not include the first solution from https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1474958089 which mentions of delaying TLS key creation.

Correction: We did explore the hashmap solution. TLS key maps to a thread's TLS array index. For a hashmap, there needs to be a relationship between the thread object and JVMTI env. It was not possible to store the thread object's reference in the hashmap because the GC can relocate it. Also, the approach to use the thread object's tid as hashmap key/value didn't work because it was not possible to find the thread object using the tid.

babsingh avatar Mar 29 '23 23:03 babsingh

I have revised my previous comment.

https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1474958089: https://github.com/eclipse-openj9/openj9/pull/17065 contains the solution to delay TLS key creation. I can confirm that it works for JDK19. The Java agent can be loaded more than 124 times.

There is an upper bound on the number of TLS keys in OpenJ9, which is currently set to 124. This is a limitation on how many agents we can support which need to store thread specific state. Will realistic applications ever exceed this limitation? Probably, not. I would consider addressing/removing this limitation as a low priority.

babsingh avatar Mar 30 '23 02:03 babsingh

can you verify if this issue is fixed with JDK19?

It isn't fixed in JDK19. It requires the fix in https://github.com/eclipse-openj9/openj9/pull/17065. Refer to https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1489463774 and https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1489595075 for more details. I ended up revising one of my old comments, which may have caused confusion.

babsingh avatar Mar 30 '23 17:03 babsingh

@dnakamura Please go with Option 1, you can use Babneet draft as a starting point

tajila avatar Mar 30 '23 17:03 tajila

Fix https://github.com/eclipse-openj9/openj9/pull/17186 has been merged. @dnakamura Can you please verify if the failure has been completely fixed and close this issue?

babsingh avatar Apr 19 '23 13:04 babsingh

Closing. I confirm that the reported issue is resolved. For verification, I used a latest JDK20 build which includes the changes from https://github.com/eclipse-openj9/openj9/pull/17186 and the testcase from https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1248426692.

babsingh avatar May 15 '23 20:05 babsingh