openj9
openj9 copied to clipboard
OpenJ9 After loading javaagent 58 times, the agentmain method will not be called again
When i load JavaAgent(OpenJ9+JDK8+Linux 86_64) 58 times,the agentmain method will not be called again.I use below code to load:
VirtualMachine vm = null;
try {
vm = VirtualMachine.attach(pid);
vm.loadAgent(localPath, "test");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (vm != null) {
try {
vm.detach();
} catch (Exception e) {
e.printStackTrace();
}
}
}
i don't know why?
@tajila
Do you have -version output from the JDK being used?
Do you have -version output from the JDK being used? java -version output:
openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-b08) Eclipse OpenJ9 VM (build openj9-0.24.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20210120_930 (JIT enabled, AOT enabled) OpenJ9 - 345e1b09e OMR - 741e94ea8 JCL - ab07c6a8fd based on jdk8u282-b08)
That build is very old. Does the problem still occur on a more recent build?
That build is very old. Does the problem still occur on a more recent build?
I tried the following two versions, but the problem still exists and is exactly the same
openjdk version "11.0.10" 2021-01-19 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.24.0, JRE 11 Linux amd64-64-Bit Compressed References 20210120_910 (JIT enabled, AOT enabled) OpenJ9 - 345e1b09e OMR - 741e94ea8 JCL - 0a86953833 based on jdk-11.0.10+9)
openjdk version "17.0.4.1" 2022-08-12 IBM Semeru Runtime Open Edition 17.0.4.1 (build 17.0.4.1+1) Eclipse OpenJ9 VM 17.0.4.1 (build openj9-0.33.1, JRE 17 Linux amd64-64-Bit Compressed References 20220812_266 (JIT enabled, AOT enabled) OpenJ9 - 1d9d16830 OMR - b58aa2708 JCL - 1f4d354e654 based on jdk-17.0.4.1+1)
Is the JDK you are attaching to using the newer build?
@JasonFengJ9 fyi since the test case uses attach.
Update:
Reproduced the issue reported, apparently the agent library was created/loaded for each VirtualMachine.loadAgent()
request, eventually exhausted TLS key maximum 124
[1], the number of loading javaagent times might vary depending on other application usages.
~Need fix findAgentLibrary()
to find previously loaded agent library~.
Edit: it was jvmtiHookGetEnv
invoking allocateEnvironment()
during each loadAgent()
, TLS cleanup only can occur when the thread destroyed or JVMTI shutdown (not this use case).
[1] https://github.com/eclipse-openj9/openj9-omr/blob/38e24339e79f9959b6ab8996f6b2ce10ed437141/include_core/omrthread_generated.h#L37
Update with testcase:
Steps to reproduce the issue:
- Run the target VM at one terminal -
$JAVA_HOME/bin/java -cp testagent.zip openj9.issues.pr15853.TargetLoop
, this just runs to keep target VM alive,Agent agentmain invoked..
appears if an agent is loaded successfully; - Run
$JAVA_HOME/bin/jcmd
at another terminal to find VM ID from step 1, a sample output -99111 openj9.issues.pr15853.TargetLoop
; - Run the attacher VM -
$JAVA_HOME/bin/java -cp testagent.zip:$JAVA_HOME/lib/tools.jar openj9.issues.pr15853.OpenJ9Attacher testagent.zip 99111
, this repeatsVirtualMachine.attach()/loadAgent()/detach()
.
After less than ~120 time Agent agentmain invoked..
output, it doesn't appears even the attacher VM still requests loadAgent()
.
Current investigation shows that TLS is allocated via omrthread_tls_alloc()
as following stacktrace:
* frame #0: 0x0000000002289b82 libj9jvmti29.dylib`allocateEnvironment [inlined] jvmtiTLSAlloc(vm=<unavailable>, handle=0x0000000004877c68) at jvmtiHelpers.c:1828:9 [opt]
frame #1: 0x0000000002289b82 libj9jvmti29.dylib`allocateEnvironment(invocationJavaVM=<unavailable>, version=<unavailable>, penv=0x000070000bc40e08) at jvmtiHelpers.c:425 [opt]
frame #2: 0x000000000228e482 libj9jvmti29.dylib`jvmtiHookGetEnv(hook=<unavailable>, eventNum=<unavailable>, eventData=0x000070000bc40db0, userData=0x0000600003d0c3e0) at jvmtiHook.c:1690:16 [opt]
frame #3: 0x000000000020ba0b libj9hookable29.dylib`J9HookDispatch(hookInterface=0x0000000002814f30, taggedEventNum=<unavailable>, eventData=0x000070000bc40db0) at hookable.cpp:235:5 [opt]
frame #4: 0x0000000002342e23 libj9vm29.dylib`GetEnv(jvm=0x0000000002812c20, penv=0x000070000bc40e08, version=805372160) at jniinv.c:897:2 [opt]
frame #5: 0x000000003d10a319 libinstrument.dylib`createNewJPLISAgent(vm=0x0000000002812c20, agent_ptr=0x000070000bc40e68) at JPLISAgent.c:217:16 [opt]
frame #6: 0x000000003d109d85 libinstrument.dylib`Agent_OnAttach(vm=0x0000000002812c20, args="/Users/ivm/jars/comm/testagent.zip=test", reserved=<unavailable>) at InvocationAdapter.c:329:17 [opt]
frame #7: 0x000000000229b601 libj9jvmti29.dylib`issueAgentOnLoadAttach(vm=<unavailable>, agentLibrary=0x0000000004876668, options="/Users/ivm/jars/comm/testagent.zip=test", loadFunctionName=<unavailable>, foundLoadFn=<unavailable>) at jvmtiStartup.c:409:7 [opt]
frame #8: 0x000000000229aca2 libj9jvmti29.dylib`loadAgentLibraryOnAttach(vm=<unavailable>, library="instrument", options=<unavailable>, decorate=1) at jvmtiStartup.c:616:8 [opt]
frame #9: 0x0000000002398048 libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(J9VMThread*) at BytecodeInterpreter.hpp:4761:15 [opt]
frame #10: 0x0000000002397f4e libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(this=0x000070000bc413a0, vmThread=<unavailable>) at BytecodeInterpreter.hpp:10641 [opt]
However this TLS entry is not associated to a vmThread
until invoking a specific API like https://github.com/eclipse-openj9/openj9/blob/ef6948124a16e50c03faa7e69383cb9677bc985d/runtime/jvmti/jvmtiHelpers.c#L1555-L1571
Hence this TLS entry won't be cleaned up because jvmtiTLSGet(vmThread, threadObject, j9env->tlsKey)
returns NULL
https://github.com/eclipse-openj9/openj9/blob/ef6948124a16e50c03faa7e69383cb9677bc985d/runtime/jvmti/jvmtiHook.c#L218-L230
In this testcase, no jvmtiTLSSet()
invoked for allocated TLS and eventually run out total 124
entries. Some error handling needed for better failure messages as well.
FYI @gacholio
@dnakamura Please take a look
This slipped under my radar, I had something in progress but was distracted by other workloads. Devin pls let me know if you need a hand.
I may be misunderstanding, but I don't think having the tls entry unset is actually relevant. Every time an agent is loaded it creates a JPLISAgent
, which in turn creates a JVMTI environment, and allocates a TLS key. I'm not 100% clear of the semantics of the JVMTI, but it appears that the agent will live until the VM is destroyed, and as such we will not release the tls key until then. Given that I'm seeing this potentially as expected behavior, although we should probably emit some kind of error when we fail to load a new agent
My thought was to replace following jvmtiTLSSet()
https://github.com/eclipse-openj9/openj9/blob/cf3c1ee0d3b38d3f7af41e1e51e4c5f7ca23f862/runtime/jvmti/jvmtiHook.c#L230
with jvmtiTLSFree(vmThread->javaVM, j9env->tlsKey)
~It fixed the issue reported~ (Edit: this note is not correct), iirc it breaks a test (can't recall the test name).
Looking a bit deeper I'm not sure that replacing the jvmtiTLSSet
with a jvmtiTLSFree
would work properly. Doing a free would free the key for all threads, even though there may be other threads which are still using the key. Also, unless I'm mistaken this the code in question would only be called when a thread is destroyed and would not be involved in this issue.
I think the solution would be to see if its feasable for multiple JPLISAgents loaded into the same vm could share a jvmti environment
Looking into things more sharing jvmti environments is not feasible since they carry state with them, so I see 2 options going forward.
-
delay allocating the TLS key until we actually need to store thread specific state. This is the simplest option to implement, and would mean that the test case passes, however the issue would still appear if you load a large number of agents which actually need to store thread specific state.
-
the JVMTI code could be re-written to keep the thread specific data in a hashmap rather than TLS storage. This the more involved option, but does mean that you no longer have a limit on the number of agents that can be loaded.
@tajila which approach do you think makes sense here?
the JVMTI code could be re-written to keep the thread specific data in a hashmap rather than TLS storage. This the more involved option, but does mean that you no longer have a limit on the number of agents that can be loaded.
@babsingh I recall we explored this approach for virtual threads, what was the outcome?
For JDK19+, we implemented lazy allocation of TLS array and thread-data in https://github.com/eclipse-openj9/openj9/pull/15541. Correction: This does not include the first solution from https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1474958089 which mentions of delaying TLS key creation.
Correction: We did explore the hashmap solution. TLS key maps to a thread's TLS array index. For a hashmap, there needs to be a relationship between the thread object and JVMTI env. It was not possible to store the thread object's reference in the hashmap because the GC can relocate it. Also, the approach to use the thread object's tid as hashmap key/value didn't work because it was not possible to find the thread object using the tid.
I have revised my previous comment.
https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1474958089: https://github.com/eclipse-openj9/openj9/pull/17065 contains the solution to delay TLS key creation. I can confirm that it works for JDK19. The Java agent can be loaded more than 124 times.
There is an upper bound on the number of TLS keys in OpenJ9, which is currently set to 124. This is a limitation on how many agents we can support which need to store thread specific state. Will realistic applications ever exceed this limitation? Probably, not. I would consider addressing/removing this limitation as a low priority.
can you verify if this issue is fixed with JDK19?
It isn't fixed in JDK19. It requires the fix in https://github.com/eclipse-openj9/openj9/pull/17065. Refer to https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1489463774 and https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1489595075 for more details. I ended up revising one of my old comments, which may have caused confusion.
@dnakamura Please go with Option 1, you can use Babneet draft as a starting point
Fix https://github.com/eclipse-openj9/openj9/pull/17186 has been merged. @dnakamura Can you please verify if the failure has been completely fixed and close this issue?
Closing. I confirm that the reported issue is resolved. For verification, I used a latest JDK20 build which includes the changes from https://github.com/eclipse-openj9/openj9/pull/17186 and the testcase from https://github.com/eclipse-openj9/openj9/issues/15853#issuecomment-1248426692.