aqa-tests icon indicating copy to clipboard operation
aqa-tests copied to clipboard

JDK8: ppc64le_linux: Test8009761.java fails with: init recursive calls: 38. After deopt 37

Open adamfarley opened this issue 1 year ago • 14 comments
trafficstars

Summary

Test8009761.java fails whenever it is run on a ppc64le_linux Ubuntu 1804 machine, but appears to pass everywhere else.

[2023-07-19T16:35:21.776Z] STDOUT:
[2023-07-19T16:35:21.776Z] CompilerOracle: exclude Test8009761.m2
[2023-07-19T16:35:21.776Z] Failed: init recursive calls: 38. After deopt 37

Example: https://trss.adoptium.net/output/test?id=64b83aef17052c671586e467 Deep History: https://trss.adoptium.net/deepHistory?testId=6615e317879917006efa06ea

Details

From 2023-07-19 (possibly earlier) to now, Test8009761 has always failed with this error when it is run on one of our ppc64le_linux Ubuntu 1804 machines, such as test-osuosl-ubuntu1804-ppc64le-2.

OS' that this test is proved to pass on include:

  • All Debian versions
  • Ubuntu 1604
  • Ubuntu 2004 and up

As far as I can tell, Test8009761 has been failing for many years, but was ignored/excluded due to an issue affecting many compiler tests (link) that affected it on all ppcle machines.

It seems this issue was resolved here, and then unexcluded here on 2023-02-07. Records between there and 2023-07-19 are not present, so I don't think we can be sure when the "init recursive" problem started, as the stack issue may have concealed it by causing (a) failure prior to the recursive issue, or (b) causing so many failures on the non-ubuntu-1804 machines that the recursive issues were drowned out.

Also, here are some examples of past failures of this test, and how they were fixed:

  • https://bugs.openjdk.org/browse/JDK-8012936
  • https://bugs.openjdk.org/browse/JDK-8015033
  • https://bugs.openjdk.org/browse/JDK-8017750

I'm not currently seeing an unresolved upstream bug that looks identical to this issue.

Machine stats

In case this was not an OS issue, but rather a throughput issue, I pulled out some stats for the failing/passing machines. I don't see a pattern, sadly. See below for the numbers, in case someone has another theory.

Data test-osuosl-ubuntu1804-ppc64le-2 (fail) test-osuosl-ubuntu2004-ppc64le-1 test-skytap-ubuntu2004-ppc64le-1 test-docker-debian11-ppc64le-1 test-osuosl-ubuntu1604-ppc64le-1
Free Physical Memory Size 4750508032 278855680 387776512 4056809472 5065408512
Free space (bytes) 30003261440 15444140032 409540165632 309474676736 55724167168
Total Physical Memory Size 8556511232 4252565504 8488419328 6442450944 8559984640
Total space (bytes) 84449624064 41551020032 422622445568 422548791296 83203571712
Usable space (bytes) 26508455936 15427362816 388065382400 287999893504 55707389952
cpuCores 4 2 32 32 4

adamfarley avatar Apr 15 '24 12:04 adamfarley

Here are a couple of grinders for this test, each one a 100x grinder of this specific unit test.

  • Ubuntu 2004: https://ci.adoptium.net/job/Grinder/9625/
  • Ubuntu 1804: https://ci.adoptium.net/job/Grinder/9626/

Also, for the last 12 runs of this test target, here are the pass/fails:

  • Pass test-docker-debian11-ppc64le-1 test-docker-debian11-ppc64le-3 x2 test-docker-ubuntu2204-ppc64le-1 test-docker-ubuntu2204-ppc64le-2 test-osuosl-ubuntu2004-ppc64le-1 test-osuosl-ubuntu1604-ppc64le-1 x2 test-skytap-ubuntu2004-ppc64le-1

  • Fail test-docker-ubuntu1804-ppc64le-1 test-osuosl-ubuntu1804-ppc64le-2 x2

adamfarley avatar Apr 15 '24 13:04 adamfarley

Ok, the Grinders were conclusive.

100/100 of the tests on test-osuosl-ubuntu2004-ppc64le-1 passed. 100/100 of the tests on test-osuosl-ubuntu1804-ppc64le-1 failed with exactly this issue.

adamfarley avatar Apr 15 '24 15:04 adamfarley

This test was rewritten in JDK 9 and allows the call count to be off by one. That might allow us to pass this test on this one special platform.

For our own purposes I wouldn't consider this test failure a blocker so there wouldn't be any priority action to resolve it.The rewrite was 10 years ago now... Maybe it would be nice to backport the adjustment to the assert 8?

JDK 8: https://github.com/openjdk/jdk8u-dev/blob/cde8aca6cb0fae77b9300b9d65d094a4f74e4d53/hotspot/test/compiler/8009761/Test8009761.java#L249

Head: https://github.com/openjdk/jdk/blob/140f56718bbbfc31bb0c39255c68568fad285a1f/test/hotspot/jtreg/compiler/uncommontrap/Test8009761.java#L284 Fix was to allow the count to be off by one, lol..

jiekang avatar Apr 15 '24 16:04 jiekang

Good catch Jie. :)

Looks like the fix here was associated with the issue here. It was meant to be backported to JDK8, but got deferred and forgotten.

Will exclude for now, and test/backport the fix after the release is resolved. Assigning to my best guesstimate of the correct post-release iteration, so I don't forget to handle this.

adamfarley avatar Apr 16 '24 12:04 adamfarley

Master exclusion: https://github.com/adoptium/aqa-tests/pull/5228 Will be cherry picked and merged into latest release branch once approved+merged. (Update: Done here)

After the release, the fix for this will be backported in Q2 Iteration 3.

adamfarley avatar Apr 16 '24 14:04 adamfarley

Hi, How can I reproduce this failure, because I do not a ppc64le_linux machine.

sendaoYan avatar Apr 21 '24 05:04 sendaoYan

Hi @sendaoYan - look for an invitation to a test-triage team. Once you accept the invitation, you can try running these Grinder jobs on Jenkins:

If you need direct access to either test machine, you would raise an infrastructure issue to request it.

smlambert avatar Apr 21 '24 11:04 smlambert

Hi @sendaoYan - look for an invitation to a test-triage team. Once you accept the invitation, you can try running these Grinder jobs on Jenkins:

If you need direct access to either test machine, you would raise an infrastructure issue to request it.

Thanks.

sendaoYan avatar Apr 22 '24 03:04 sendaoYan

Upstream bug raised here: https://bugs.openjdk.org/browse/JDK-8330973 Upstream PR raised here: https://github.com/openjdk/jdk8u-dev/pull/487

adamfarley avatar Apr 23 '24 13:04 adamfarley

Severin has asked for a backport of the full JDK9 fix, rather than the minimal version, so I'm checking the backport to make sure it's clean.

adamfarley avatar Apr 24 '24 08:04 adamfarley

Looking into the creation of the backport PR now.

Commit generated here. Testing underway. Links to follow.

https://ci.adoptium.net/job/Grinder/9898/console

Note: May need re-launching if the relative test path is incorrect.

adamfarley avatar May 09 '24 15:05 adamfarley

Resuming this task. Here's a grinder rerun: https://ci.adoptium.net/job/Grinder/10554/

Update: Test run passed. Creating upstream PR and updating the associated bug.

adamfarley avatar Jul 10 '24 09:07 adamfarley

Update: The upstream PR has been merged into jdk8u dev. Will unexclude this test once the change gets merged into jdk8u.

adamfarley avatar Aug 21 '24 09:08 adamfarley