openj9
openj9 copied to clipboard
java.lang.ClassNotFoundException: during object deserialization in openjdk11.07 with openj9
Java -version output
openjdk version "11.0.7" 2020-04-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled) OpenJ9 - 05fa2d361 OMR - d4365f371 JCL - 838028fc9d based on jdk-11.0.7+10)
Summary of problem
This problem looks similar to #8454, however happens under different circumstances. We can confirm that the #8454 was fixed, and our test case for that instance still works.
When our code attempts to deserialise, we get the following stack trace (abbreviated:)
Caused by: java.lang.ClassNotFoundException: <our class name>
at java.base/java.lang.Class.forNameImpl(Native Method)
at java.base/java.lang.Class.forName(Class.java:412)
at java.base/java.io.ClassCache$FutureValue.get(ClassCache.java:190)
at java.base/java.io.ClassCache.get(ClassCache.java:161)
at java.base/java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:848)
at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2065)
at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1951)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2239)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1768)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2541)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2433)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2266)
Diagnostic files
No crash happens, so no files are produced. It just fails to load the class.
Running this under 11.05 works, or running 11.07 with -Dcom.ibm.enableClassCaching=false
also works.
We have not yet been able to recreate a "simple" test case for this, but will try to spend some time next week. Ay suggestions on what we might need to do differently to #8454 to trigger this would be appreciated! We can trigger this at will on our (large) code base, so we could also potentially provide logging and/or testing of a pre-release for that.
@theresa-m can you take a look at this? You resolved the last couple of issues in this code
Set for the 0.22 milestone give it's late in the cycle for 0.21, however if there is a quick fix we can evaluate the risk of putting it into 0.21.
I'll take a look.
Are you using the defaultReadObject
method in your custom serialization code? I believe there is a cache refresh missing here.
The stack trace doesn't quite go back far enough for me to know where the call originated.
If thats the case this may fix the problem: https://github.com/ibmruntimes/openj9-openjdk-jdk11/pull/317
build with changes to try: https://drive.google.com/file/d/1UKnsL6mGu10qKz819Bon7pJo0yTX2GpA/view?usp=sharing
Hello @theresa-m
Thank you for the update. I've downloaded and tried the JDK you've linked, unfortunately it still fails in the same way. Below I've added all info including the full failing stacktrace.
openjdk version "11.0.8-internal" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build master-d2ab44a37e0, JRE 11 Linux amd64-64-Bit Compressed References 20200617_2175 (JIT enabled, AOT enabled)
OpenJ9 - d2ab44a37e0
OMR - b884d8266d7
JCL - 9d820f766f3 based on jdk-11.0.8+7)
Full stack:
Caused by: java.lang.ClassNotFoundException: com.redwood.scheduler.custom.fca.balancesheet.documents.DocumentHandler
at java.base/java.lang.Class.forNameImpl(Native Method)
at java.base/java.lang.Class.forName(Class.java:412)
at java.base/java.io.ClassCache$FutureValue.get(ClassCache.java:190)
at java.base/java.io.ClassCache.get(ClassCache.java:161)
at java.base/java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:851)
at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2068)
at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1954)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2242)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:548)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:458)
at java.base/java.util.HashMap.readObject(HashMap.java:1460)
at jdk.internal.reflect.GeneratedMethodAccessor88.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2403)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:548)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:458)
Hi @theresa-m, We do use defaultReadObject()
in our code, I am not sure if that code would be invoked in this case though. At least it isn't in the call stack :-)
Oh, and I may have misread your question, we also override the default readObject()
in some of our classes, again, I can't say if they are involved in this (de)serialisation, as they might be in the stream, but not in the stack trace.
Okay. Thanks for the additional stack trace and information. I'll continue to investigate.
I've updated my draft with a couple of other places where the cache should be refreshed https://github.com/ibmruntimes/openj9-openjdk-jdk11/pull/317
Sample build with this change: https://drive.google.com/file/d/15lq9C460QuR-LVTXv5t-5gpLv7suAklW/view?usp=sharing
If that doesn't work, can you also try including the -Xint
option on your command line to see if that makes a difference?
Hello @theresa-m
I gave your new jdk a try, unfortunately it fails very early during startup of our product (so not related to the issue reported). As the product does not start I can't test more. Something else seems wrong with deserialization in this jdk here:
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build master-e006f8232bb, JRE 11 Linux amd64-64-Bit Compressed References 20200622_2197 (JIT enabled, AOT enabled)
OpenJ9 - e006f8232bb
OMR - 934b0ff765c
JCL - 899077a4d3c based on jdk-11.0.8+7)
java.io.StreamCorruptedException: invalid type code: 00
at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:3179) ~[?:?]
at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:3213) ~[?:?]
at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:3115) ~[?:?]
at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:2501) ~[?:?]
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2097) ~[?:?]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1971) ~[?:?]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2259) ~[?:?]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1788) ~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:548) ~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:458) ~[?:?]
at com.redwood.scheduler.extensionpoint.impl.ExtensionPointHTTPSession.remapMap(ExtensionPointHTTPSession.java:259) ~[extension-point.jar:?]
The -Xint flag has no effect except making things slow (same stack).
I double checked with your previous linked jdk, but that starts up fine (and has the original error we reported).
hmm okay thanks for trying it out. That options helps me narrow down there's no issues with the JIT ludcl optimization either. I'll keep digging into it.
Thank you, I hope you can find the problem. :)
Hi @theresa-m, I see that this didn't make 11.08, is there anything that we can do to help in the diagnosis?
Thanks for the offer! I'll dive back into this on Monday and let you know, this is still high on my list for the September release.
I'm still not sure why the latest change caused the StreamCorruptedException. I'm thinking of adding some logging statements into the next build but still thinking about what would be the most useful information to help diagnose the problem.
Hello, I think the most likely place the optimization is failing is when going in and out of the HashMap class but I was hoping to confirm if you would be willing to try out this build: https://drive.google.com/file/d/1YbIq8mWEwCsiKa0s9F498wLrPPaCUpzQ/view?usp=sharing
It adds some information into the message of the exception. It prints current classloader names if you are comfortable with that: https://github.com/ibmruntimes/openj9-openjdk-jdk11/compare/openj9...theresa-m:ludcl_refresh_2
Thank you for the build, I'll hope to have a look at it later this week/next week and will get back to you then.
Hello, I've tested with your build and it fails and now with a large message containing your long log message (over 3000 lines).
Just to play it safe for legal reasons, can I send/upload the result somewhere privately to you? Thanks.
Thanks! Oh no I had hoped it would be shorter. The openj9 public slack channel could be a good place to dm me. Probably not going to need all 3000 lines either.
This should be the link to join our slack if you are not already part of it: openj9 slack
Again thanks for your willingness to try things.
The main thing I'm looking for is the start where "cached ludcl is: " <ludcl> " expected ludcl is: " <ludcl>
print different ludcls to help narrow down when there is a cache error. That should hopefully be near to the last few lines of output.
Moving this to the Oct quarterly update but if a low risk fix is found before we ship, we can consider porting to the release branch for jdk15
Hi @mreuvers I've sent a follow up re this issue on slack.
@sharon-wang can you please take over this issue
@tajila sure thing, I'll start looking into this.
@sharon-wang any update on this?
@pshipton Been chatting back and forth with @mreuvers in the OpenJ9 Slack.
I've tried looking at the sample app from #8454 (different problem than this one, but same classloader setup) to identify possible code paths leading to this failure. So far I have not been able to reproduce the current issue by tweaking the sample app.
Sent over a JDK with some changes (extra startingLudclObject checks), but the failure is still being seen. Will be sending over another JDK for Martijn to try with some debug prints to try to get more insight into the cached ludcl.
I'll tentatively move this to the next milestone, but if a fix is ready by Oct 4 we should evaluate if it's low enough risk to put into the 0.23 release.
Just an update, we were able to capture specific classloader information in a debug build. We're able to see the classloaders being used throughout object deserialization in the application and have identified the spot where a refresh appears to be missing, resulting in the exception. I added a "force refresh" option in the debug build and @mreuvers confirmed that using this debug option does alleviate the error.
This means that refreshLudcl
is not being set to true
or is incorrectly being set to false
just before the exception occurs. I'm trying to figure out how this is occurring...
Here's a description of the steps to the failure (with obscured class info):
-
readObject()
call -
cachedLudcl
=ClassLoaderForClassA
-
refreshLudcl=true
, socachedLudcl
is refreshed toClassLoaderForClassA
- No other apparent deserialization occurs (based on lack of output from debug)
-
resolveClass()
is called forClassA
-
cachedLudcl
=ClassLoaderForClassB
-- not sure how/where this is getting updated, looking into this -
refreshLudcl=true
, socachedLudcl
is refreshed toClassLoaderForClassA
-
resolveClass()
is called forClassB
-
cachedLudcl
=ClassLoaderForClassA
-
refreshLudcl=false
, socachedLudcl
is not refreshed - Attempt to resolve
ClassB
fromClassLoaderForClassA
-
ClassNotFoundException
occurs
So, I'm looking at:
- how did the last user defined classloader get updated between the initial
readObject()
and 6? The classloader in 6 is the one we want, but it's not clear why it shows up at this point. - why is
refreshLudcl=false
in 10, since it should be true?
pushing this to the next release