openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

NPEs in specjbb2015 with Balanced in Arraylets mode (off-heap disabled)

Open dmitripivkine opened this issue 6 months ago • 12 comments

Easy to reproduce in internal testing with -XXgc:disableVirtualLargeObjectHeap, observed on multiple platforms including xLinux and zLinux. Job status can be "Passed" but there is NPEs and premature test end - for example this job.

dmitripivkine avatar May 29 '25 18:05 dmitripivkine

@zl-wang @rmnattas @VermaSh FYI, potential Arraylet support regression.

dmitripivkine avatar May 29 '25 18:05 dmitripivkine

This month I have done ~16 iterations of specjbb2015 on zLinux for off-heap, balanced and gencon but didn't see any failures. However, my build is based on a month old openj9, omr. Trying to narrow down to the commit.

VermaSh avatar May 29 '25 18:05 VermaSh

This month I have done ~16 iterations of specjbb2015 on zLinux for off-heap, balanced and gencon but didn't see any failures. However, my build is based on a month old openj9, omr. Trying to narrow down to the commit.

Yes, specjbb2015 seems work with off-heap enabled. The failure with off-heap disabled is easy to reproduce (8 of 10).

dmitripivkine avatar May 29 '25 19:05 dmitripivkine

The failure was first seen with 93684. Last passing build was 93623.

Last passing build was:

java version "1.8.0_451"
Java(TM) SE Runtime Environment (build 8.0.9.0 - pxz6480sr9-20250422_01(SR9))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux s390x-64-Bit Compressed References 20250506_93623 (JIT enabled, AOT enabled)
OpenJ9   - c9e8a890e57
OMR      - ef7f3233f
IBM      - e7592ac)
JCL - 20250409_01 based on Oracle jdk8u451-b10

First failing build was:

java version "1.8.0_451"
Java(TM) SE Runtime Environment (build 8.0.9.0 - pxz6480sr9-20250422_01(SR9))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux s390x-64-Bit Compressed References 20250507_93684 (JIT enabled, AOT enabled)
OpenJ9   - 98ed74326e5
OMR      - 4327b212f
IBM      - e7592ac)
JCL - 20250409_01 based on Oracle jdk8u451-b10

Here's the commit delta between the two builds: openj9, omr. Doing additional tests to narrow down on a single commit from the delta.

VermaSh avatar Jun 09 '25 15:06 VermaSh

FYI https://github.com/eclipse-openj9/openj9/issues/21822 was opened due to https://github.com/eclipse-openj9/openj9/pull/21805 and there was a fix recently merged.

pshipton avatar Jun 09 '25 15:06 pshipton

Set as a blocker since arraylet support is required for z/TPF, and metronome.

pshipton avatar Jun 09 '25 15:06 pshipton

@vermash, in order to determine whether this is a duplicate of #21822 one thing to look at is whether the methods where NullPointerExceptions were seen were compiled at (or inlined into methods compiled at) very-hot.

hzongaro avatar Jun 10 '25 12:06 hzongaro

First method from the stack was warm compiled. But second method, java.io.ObjectOutputStream.writeObject0, was compiled at scorching.

Stack for one of the NPE

Caused by: java.lang.NullPointerException
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:381)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1160)
        at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1615)
        at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1576)
        at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1499)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1209)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:365)
        at org.spec.jbb.core.comm.transport.encapsulation.JavaSerializationEncapsulationLayer.convertDown(JavaSerializa
        at org.spec.jbb.core.comm.transport.encapsulation.JavaSerializationEncapsulationLayer.convertDown(JavaSerializa
        at org.spec.jbb.core.comm.transport.TransportPipeline.convertDown(TransportPipeline.java:28)
        at org.spec.jbb.infra.snapshot.InMemorySnapshotManager.snapshot(InMemorySnapshotManager.java:51)
        at org.spec.jbb.infra.BackendAgent$7.call(BackendAgent.java:292)
        at org.spec.jbb.infra.BackendAgent$7.call(BackendAgent.java:289)
        at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1435)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:300)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1078)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1714)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

First method from the stack

+ (warm) java/io/ObjectStreamClass.lookup(Ljava/lang/Class;Z)Ljava/io/ObjectStreamClass; @ 000003FF9C3359F6-000003FF9C337CFC OrdinaryMethod - Q_SZ=5 Q_SZI=0 QW=344 j9m=00000000010F8B00 bcsz=346 OSR compThreadID=1 CpuLoad=340%(85%avg) JvmCpu=340%

Second method from the stack

+ (scorching) java/io/ObjectOutputStream.writeObject0(Ljava/lang/Object;Z)V @ 000003FF9C5868F0-000003FF9C593234 OrdinaryMethod J Q_SZ=1 Q_SZI=0 QW=200 j9m=0000000001103EC0 bcsz=674 OSR compThreadID=1 CpuLoad=301%(75%avg) JvmCpu=302%

Here are the logs from the failing run with 20250507_93684 for reference verbose.log composite.log

Building an sdk with the new fix https://github.com/eclipse-openj9/openj9/pull/22028. On parallel, I have also launched specjbb with verbose={compile*|inlining} to help determine if the failure is a dup of https://github.com/eclipse-openj9/openj9/issues/21822.

VermaSh avatar Jun 10 '25 16:06 VermaSh

The failure goes away with a newer build, containing https://github.com/eclipse-openj9/openj9/pull/22028 Here's the java version of the newer build I tested:

java version "1.8.0_461"
Java(TM) SE Runtime Environment (build 8.0.9.0 - pxz6480sr9-20250603_02(SR9))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux s390x-64-Bit Compressed References 20250606_95859 (JIT enabled, AOT enabled)
OpenJ9   - 8003e1178cf
OMR      - fa7e4fac4
IBM      - e7592ac)
JCL - 20250602_01 based on Oracle jdk8u461-b09

VermaSh avatar Jun 11 '25 14:06 VermaSh

Looking at the verbose log mentioned in an earlier comment, I notice that ObjectOutputStream.writeObject0 had been compiled at very-hot shortly before it was compiled at scorching. It’s possible that the very-hot compilation was still running in that failure.

+ (profiled very-hot) java/io/ObjectOutputStream.writeObject0(Ljava/lang/Object;Z)V @ 000003FF9C572CF0-000003FF9C580EFC OrdinaryMethod 40.80% T Q_SZ=1 Q_SZI=0 QW=130 j9m=0000000001103EC0 bcsz=674 OSR JPROF compThreadID=1 CpuLoad=249%(62%avg) JvmCpu=250%

hzongaro avatar Jun 11 '25 21:06 hzongaro

Give that this failure was fixed by https://github.com/eclipse-openj9/openj9/pull/22028, can this be closed or are there additional investigations that need to be done?

VermaSh avatar Jun 16 '25 20:06 VermaSh

I'm comfortable with this being closed as a duplicate of #21822 if there are no concerns.

hzongaro avatar Jun 16 '25 20:06 hzongaro

This problem seems to be a duplicate of #21822, which was fixed by pull request #22028.

hzongaro avatar Jun 17 '25 13:06 hzongaro