openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

java.lang.ClassNotFoundException: during object deserialization in openjdk11.07 with openj9

Open pwagland opened this issue 4 years ago • 35 comments

Java -version output

openjdk version "11.0.7" 2020-04-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled) OpenJ9 - 05fa2d361 OMR - d4365f371 JCL - 838028fc9d based on jdk-11.0.7+10)

Summary of problem

This problem looks similar to #8454, however happens under different circumstances. We can confirm that the #8454 was fixed, and our test case for that instance still works.

When our code attempts to deserialise, we get the following stack trace (abbreviated:)

Caused by: java.lang.ClassNotFoundException: <our class name>
	at java.base/java.lang.Class.forNameImpl(Native Method)
	at java.base/java.lang.Class.forName(Class.java:412)
	at java.base/java.io.ClassCache$FutureValue.get(ClassCache.java:190)
	at java.base/java.io.ClassCache.get(ClassCache.java:161)
	at java.base/java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:848)
	at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2065)
	at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1951)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2239)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1768)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2541)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2433)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2266)

Diagnostic files

No crash happens, so no files are produced. It just fails to load the class.

Running this under 11.05 works, or running 11.07 with -Dcom.ibm.enableClassCaching=false also works.

We have not yet been able to recreate a "simple" test case for this, but will try to spend some time next week. Ay suggestions on what we might need to do differently to #8454 to trigger this would be appreciated! We can trigger this at will on our (large) code base, so we could also potentially provide logging and/or testing of a pre-release for that.

pwagland avatar Jun 17 '20 10:06 pwagland

@theresa-m can you take a look at this? You resolved the last couple of issues in this code

DanHeidinga avatar Jun 17 '20 13:06 DanHeidinga

Set for the 0.22 milestone give it's late in the cycle for 0.21, however if there is a quick fix we can evaluate the risk of putting it into 0.21.

pshipton avatar Jun 17 '20 13:06 pshipton

I'll take a look.

theresa-m avatar Jun 17 '20 15:06 theresa-m

Are you using the defaultReadObject method in your custom serialization code? I believe there is a cache refresh missing here.

The stack trace doesn't quite go back far enough for me to know where the call originated.

theresa-m avatar Jun 17 '20 23:06 theresa-m

If thats the case this may fix the problem: https://github.com/ibmruntimes/openj9-openjdk-jdk11/pull/317

build with changes to try: https://drive.google.com/file/d/1UKnsL6mGu10qKz819Bon7pJo0yTX2GpA/view?usp=sharing

theresa-m avatar Jun 18 '20 00:06 theresa-m

Hello @theresa-m

Thank you for the update. I've downloaded and tried the JDK you've linked, unfortunately it still fails in the same way. Below I've added all info including the full failing stacktrace.

openjdk version "11.0.8-internal" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build master-d2ab44a37e0, JRE 11 Linux amd64-64-Bit Compressed References 20200617_2175 (JIT enabled, AOT enabled)
OpenJ9   - d2ab44a37e0
OMR      - b884d8266d7
JCL      - 9d820f766f3 based on jdk-11.0.8+7)

Full stack:

Caused by: java.lang.ClassNotFoundException: com.redwood.scheduler.custom.fca.balancesheet.documents.DocumentHandler
	at java.base/java.lang.Class.forNameImpl(Native Method)
	at java.base/java.lang.Class.forName(Class.java:412)
	at java.base/java.io.ClassCache$FutureValue.get(ClassCache.java:190)
	at java.base/java.io.ClassCache.get(ClassCache.java:161)
	at java.base/java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:851)
	at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2068)
	at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1954)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2242)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:548)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:458)
	at java.base/java.util.HashMap.readObject(HashMap.java:1460)
	at jdk.internal.reflect.GeneratedMethodAccessor88.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2403)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2544)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2436)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2269)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1771)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:548)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:458)

mreuvers avatar Jun 18 '20 06:06 mreuvers

Hi @theresa-m, We do use defaultReadObject() in our code, I am not sure if that code would be invoked in this case though. At least it isn't in the call stack :-)

pwagland avatar Jun 18 '20 07:06 pwagland

Oh, and I may have misread your question, we also override the default readObject() in some of our classes, again, I can't say if they are involved in this (de)serialisation, as they might be in the stream, but not in the stack trace.

pwagland avatar Jun 18 '20 10:06 pwagland

Okay. Thanks for the additional stack trace and information. I'll continue to investigate.

theresa-m avatar Jun 19 '20 22:06 theresa-m

I've updated my draft with a couple of other places where the cache should be refreshed https://github.com/ibmruntimes/openj9-openjdk-jdk11/pull/317

Sample build with this change: https://drive.google.com/file/d/15lq9C460QuR-LVTXv5t-5gpLv7suAklW/view?usp=sharing

If that doesn't work, can you also try including the -Xint option on your command line to see if that makes a difference?

theresa-m avatar Jun 22 '20 23:06 theresa-m

Hello @theresa-m

I gave your new jdk a try, unfortunately it fails very early during startup of our product (so not related to the issue reported). As the product does not start I can't test more. Something else seems wrong with deserialization in this jdk here:

OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build master-e006f8232bb, JRE 11 Linux amd64-64-Bit Compressed References 20200622_2197 (JIT enabled, AOT enabled)
OpenJ9   - e006f8232bb
OMR      - 934b0ff765c
JCL      - 899077a4d3c based on jdk-11.0.8+7)

java.io.StreamCorruptedException: invalid type code: 00
        at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:3179) ~[?:?]
        at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:3213) ~[?:?]
        at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:3115) ~[?:?]
        at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:2501) ~[?:?]
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2097) ~[?:?]
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1971) ~[?:?]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2259) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1788) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:548) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:458) ~[?:?]
        at com.redwood.scheduler.extensionpoint.impl.ExtensionPointHTTPSession.remapMap(ExtensionPointHTTPSession.java:259) ~[extension-point.jar:?]

The -Xint flag has no effect except making things slow (same stack).

I double checked with your previous linked jdk, but that starts up fine (and has the original error we reported).

mreuvers avatar Jun 23 '20 07:06 mreuvers

hmm okay thanks for trying it out. That options helps me narrow down there's no issues with the JIT ludcl optimization either. I'll keep digging into it.

theresa-m avatar Jun 25 '20 18:06 theresa-m

Thank you, I hope you can find the problem. :)

mreuvers avatar Jun 26 '20 14:06 mreuvers

Hi @theresa-m, I see that this didn't make 11.08, is there anything that we can do to help in the diagnosis?

pwagland avatar Jul 16 '20 13:07 pwagland

Thanks for the offer! I'll dive back into this on Monday and let you know, this is still high on my list for the September release.

theresa-m avatar Jul 16 '20 21:07 theresa-m

I'm still not sure why the latest change caused the StreamCorruptedException. I'm thinking of adding some logging statements into the next build but still thinking about what would be the most useful information to help diagnose the problem.

theresa-m avatar Jul 23 '20 17:07 theresa-m

Hello, I think the most likely place the optimization is failing is when going in and out of the HashMap class but I was hoping to confirm if you would be willing to try out this build: https://drive.google.com/file/d/1YbIq8mWEwCsiKa0s9F498wLrPPaCUpzQ/view?usp=sharing

It adds some information into the message of the exception. It prints current classloader names if you are comfortable with that: https://github.com/ibmruntimes/openj9-openjdk-jdk11/compare/openj9...theresa-m:ludcl_refresh_2

theresa-m avatar Jul 30 '20 14:07 theresa-m

Thank you for the build, I'll hope to have a look at it later this week/next week and will get back to you then.

mreuvers avatar Aug 04 '20 15:08 mreuvers

Hello, I've tested with your build and it fails and now with a large message containing your long log message (over 3000 lines).

Just to play it safe for legal reasons, can I send/upload the result somewhere privately to you? Thanks.

mreuvers avatar Aug 10 '20 09:08 mreuvers

Thanks! Oh no I had hoped it would be shorter. The openj9 public slack channel could be a good place to dm me. Probably not going to need all 3000 lines either.

This should be the link to join our slack if you are not already part of it: openj9 slack

Again thanks for your willingness to try things.

theresa-m avatar Aug 11 '20 17:08 theresa-m

The main thing I'm looking for is the start where "cached ludcl is: " <ludcl> " expected ludcl is: " <ludcl> print different ludcls to help narrow down when there is a cache error. That should hopefully be near to the last few lines of output.

theresa-m avatar Aug 11 '20 17:08 theresa-m

Moving this to the Oct quarterly update but if a low risk fix is found before we ship, we can consider porting to the release branch for jdk15

DanHeidinga avatar Aug 12 '20 13:08 DanHeidinga

Hi @mreuvers I've sent a follow up re this issue on slack.

theresa-m avatar Aug 20 '20 13:08 theresa-m

@sharon-wang can you please take over this issue

tajila avatar Aug 25 '20 13:08 tajila

@tajila sure thing, I'll start looking into this.

sharon-wang avatar Aug 25 '20 15:08 sharon-wang

@sharon-wang any update on this?

pshipton avatar Sep 23 '20 19:09 pshipton

@pshipton Been chatting back and forth with @mreuvers in the OpenJ9 Slack.

I've tried looking at the sample app from #8454 (different problem than this one, but same classloader setup) to identify possible code paths leading to this failure. So far I have not been able to reproduce the current issue by tweaking the sample app.

Sent over a JDK with some changes (extra startingLudclObject checks), but the failure is still being seen. Will be sending over another JDK for Martijn to try with some debug prints to try to get more insight into the cached ludcl.

sharon-wang avatar Sep 23 '20 19:09 sharon-wang

I'll tentatively move this to the next milestone, but if a fix is ready by Oct 4 we should evaluate if it's low enough risk to put into the 0.23 release.

pshipton avatar Sep 23 '20 20:09 pshipton

Just an update, we were able to capture specific classloader information in a debug build. We're able to see the classloaders being used throughout object deserialization in the application and have identified the spot where a refresh appears to be missing, resulting in the exception. I added a "force refresh" option in the debug build and @mreuvers confirmed that using this debug option does alleviate the error.

This means that refreshLudcl is not being set to true or is incorrectly being set to false just before the exception occurs. I'm trying to figure out how this is occurring...

Here's a description of the steps to the failure (with obscured class info):

  1. readObject() call
  2. cachedLudcl = ClassLoaderForClassA
  3. refreshLudcl=true, so cachedLudcl is refreshed to ClassLoaderForClassA
  4. No other apparent deserialization occurs (based on lack of output from debug)
  5. resolveClass() is called for ClassA
  6. cachedLudcl = ClassLoaderForClassB -- not sure how/where this is getting updated, looking into this
  7. refreshLudcl=true, so cachedLudcl is refreshed to ClassLoaderForClassA
  8. resolveClass() is called for ClassB
  9. cachedLudcl = ClassLoaderForClassA
  10. refreshLudcl=false, so cachedLudcl is not refreshed
  11. Attempt to resolve ClassB from ClassLoaderForClassA
  12. ClassNotFoundException occurs

So, I'm looking at:

  • how did the last user defined classloader get updated between the initial readObject() and 6? The classloader in 6 is the one we want, but it's not clear why it shows up at this point.
  • why is refreshLudcl=false in 10, since it should be true?

sharon-wang avatar Nov 06 '20 17:11 sharon-wang

pushing this to the next release

tajila avatar Dec 02 '20 16:12 tajila