openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

AIX build failure in ClassFileOracle

Open keithc-ca opened this issue 6 months ago • 15 comments

See https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/760, for example

[2025-05-08T11:28:06.393Z] [ 36%] Building CXX object runtime/bcutil/CMakeFiles/j9dyn.dir/ClassFileOracle.cpp.o
[2025-05-08T11:28:16.426Z]     1500-004: (U) INTERNAL COMPILER ERROR while compiling ClassFileOracle::LocalVariablesIterator::hasGenericSignature().  Compilation ended.  Contact your Service Representative and provide the following information: Internal abort. For more information visit: http://www.ibm.com/support/docview.wss?uid=swg21110810
[2025-05-08T11:28:17.818Z] 1586-346 (U) An error occurred during code generation.  The code generation return code was 1.
[2025-05-08T11:28:19.316Z] gmake[6]: *** [runtime/bcutil/CMakeFiles/j9dyn.dir/build.make:133: runtime/bcutil/CMakeFiles/j9dyn.dir/ClassFileOracle.cpp.o] Error 1
[2025-05-08T11:28:19.316Z] gmake[6]: *** Waiting for unfinished jobs....

The last successful build I found for JDK11 on AIX is https://openj9-jenkins.osuosl.org/job/Pipeline-OMR-Acceptance/873. The first failing build was https://openj9-jenkins.osuosl.org/job/Pipeline-OMR-Acceptance/876.

Changes:

  • openj9: https://github.com/eclipse-openj9/openj9/compare/eb473bd9d39..a9136b8a79f
  • omr: https://github.com/eclipse-omr/omr/compare/38fbca611ff..38fbca611ff
  • jdk11: https://github.com/ibmruntimes/openj9-openjdk-jdk11/compare/aef46aabde...01db13aecfc

keithc-ca avatar May 08 '25 14:05 keithc-ca

It still builds in the nightly builds, so I think it's intermittent. i.e. from last night https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-JDK11/1062/

pshipton avatar May 08 '25 22:05 pshipton

That nightly succeeded on p8-java1-ibm10 while https://openj9-jenkins.osuosl.org/job/Pipeline-OMR-Acceptance/879 failed (again) on p8-java1-ibm08. Failures were on several different machines:

  • https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/759 - p8-java1-ibm12
  • https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/760 - p8-java1-ibm09
  • https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/761 - p8-java1-ibm08

keithc-ca avatar May 09 '25 14:05 keithc-ca

I can't think of any reason it would pass in the nightly builds but fail in the OMR builds. Last night it passed on p8-java1-ibm12, which includes the OMR changes from yesterday. https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_Nightly/1047/ - p8-java1-ibm12

pshipton avatar May 09 '25 15:05 pshipton

I think "intermittent" is a rather generous term to describe the situation. That it only seems to fail for jdk11 seems relevant, but I didn't find any changes in that source file, nor any included file, that would explain this.

keithc-ca avatar May 09 '25 15:05 keithc-ca

The only difference I can find is the build directory name. Build_JDK11_ppc64_aix_Nightly vs Build_JDK11_ppc64_aix_OMR

pshipton avatar May 14 '25 16:05 pshipton

Nightly build job passed, that I filled in with the same parameters as a failing OMR build job. https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_Nightly/1051/

https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/766

pshipton avatar May 15 '25 00:05 pshipton

@zl-wang can the XLC team take a look at this INTERNAL COMPILER ERROR with 16.01.0000.0020.

https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/781/

pshipton avatar Jun 12 '25 14:06 pshipton

There is a fixpack 21, maybe we need to try it.

pshipton avatar Jun 12 '25 14:06 pshipton

Created https://github.ibm.com/runtimes/infrastructure/issues/10828 so we can try it out.

pshipton avatar Jun 12 '25 14:06 pshipton

yes, try out the latest PTF first, before i get xlC team involved

zl-wang avatar Jun 12 '25 14:06 zl-wang

@zl-wang we tried 16.1.0.21 but the same problem occurs.

IBM XL C/C++ for AIX, V16.1.0  (5725-C72, 5765-J12)
Version: 16.01.0000.0021

https://openj9-jenkins.osuosl.org/job/Build_JDK11_ppc64_aix_OMR/784/

08:55:18      1500-004: (U) INTERNAL COMPILER ERROR while compiling ClassFileOracle::LocalVariablesIterator::hasGenericSignature().  Compilation ended.  Contact your Service Representative and provide the following information: Internal abort. For more information visit: http://www.ibm.com/support/docview.wss?uid=swg21110810

pshipton avatar Jun 13 '25 13:06 pshipton

Repeat that problematic compilation command line now but with an additional option -P (I believed), i.e. only pre-processing. Then, it generates a pre-processed file (written in file <OriginalFileName>.i) i.e. every include etc is consolidated in that .i file. Send it to me, and I will let them take over. so that, they can do investigations with that file only (no need header files etc).

zl-wang avatar Jun 13 '25 13:06 zl-wang

i am right about the option: -P Preprocesses the C or C++ source files named in the compiler invocation and creates an output preprocessed source file for each input source file. The preprocessed output file has the same name as the input file, with a .i suffix.

zl-wang avatar Jun 13 '25 14:06 zl-wang

The following recreates it for me on both AIX 7.2 and 7.3 machines.

/opt/IBM/xlC/16.1.0/bin/xlclang++ -x c++ -DAIXPPC -DIPv6_FUNCTION_SUPPORT -DJ9_INTERNAL_TO_VM -DOPENJ9_BUILD -DPPC -DPPC64 -DRS6000 -D_ALL_SOURCE -D_LARGE_FILES -qnoeh -fno-exceptions -g -qalias=noansi -qxflag=LTOL:LTOL0 -q64 -qxlcompatmacros -O3 -qstackprotect -fno-rtti -qlanglvl=extended0x -qlanglvl=extended0x -qnortti -qsuppress=1540-1087:1540-1088:1540-1090 -fPIC -qhalt=w -o ClassFileOracle.cpp.o -c ClassFileOracle.i

ClassFileOracle.zip

pshipton avatar Jun 13 '25 20:06 pshipton

a defect was opened in xlC side:

https://compjazz.rtp.raleigh.ibm.com:9443/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/174570

ICE goes away if -qstackprotect option is removed though.

zl-wang avatar Jun 16 '25 15:06 zl-wang

update in the RTC defect: (jist of it: looks like a normal OOM issue)

The traceback comes from AS, the final assembly pass, which does binary encoding and object file creation. Specifically the top level driver for AS - when doing a memory allocate. The stack protect code was created by epilogue.cpp a long time before, and AS doesn't have anything directly to do with it. I think this is just an out of memory error. Following that theory, I was able to see a successful compile if I removed -g from the compile command, or if I used -qlinedebug in place of -g, or if I added -qcompact. I looked at the code listings just before AS, and I only see < 10 instructions related to stack protect, in 1 function, so I don't see much evidence that it is doing anything crazy to blow things up. And the compilation will still work with stackprotect, if we change other options to reduce memory.

zl-wang avatar Jun 17 '25 12:06 zl-wang