eclipse.platform.ui icon indicating copy to clipboard operation
eclipse.platform.ui copied to clipboard

Add memory reporting for UI tests and exit for E4Testable on OOM

Open iloveeclipse opened this issue 1 year ago • 13 comments

Maybe this could help understanding OOM errors on jenkins.

See https://github.com/eclipse-platform/eclipse.platform.ui/issues/2432

iloveeclipse avatar Oct 21 '24 09:10 iloveeclipse

Test Results

  300 files    300 suites   7m 25s ⏱️ 5 883 tests 5 848 ✅ 35 💤 0 ❌ 6 270 runs  6 235 ✅ 35 💤 0 ❌

Results for commit 377b64d7.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 21 '24 10:10 github-actions[bot]

Looks like OOM's are happening now later, after UITestsuite is executed:

Tests run: 1659, Failures: 7, Errors: 44, Skipped: 196


Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Active Thread: Equinox Container: 97a6556b-edd7-45ed-ab09-86bf747df888"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "Worker-47" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Worker-30" java.lang.OutOfMemoryError: Java heap space

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Worker-48"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Worker-49"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Worker-50"

The last memory output looks like

#################################################
org.eclipse.ui.tests.preferences.ViewerItemsLimitTest
########### Memory usage reported by JVM ########
   1.073.741.824 bytes max heap
   1.034.944.512 bytes heap allocated
     659.199.560 bytes free heap
     375.744.952 bytes used heap
#################################################

The only test after that was org.eclipse.ui.tests.stress.OpenCloseTest which doesn't look suspicious on first glance.

iloveeclipse avatar Oct 21 '24 15:10 iloveeclipse

Thank you for investing time to track down this problem.

merks avatar Oct 21 '24 15:10 merks

Last build had no OOM's but multiple test fails due https://github.com/eclipse-platform/eclipse.platform/issues/1592, see

https://ci.eclipse.org/platform/job/eclipse.platform.ui/job/PR-2433/2/#showFailuresLink

I guess the OOM problem could be related / fixed by https://github.com/eclipse-jdt/eclipse.jdt.core/pull/3126, I saw https://github.com/eclipse-jdt/eclipse.jdt.core/commit/9c11818e53959318b2914d52978dd7729bd7df88 broke lot of things in our company internal tests, including endless loops and test crashes. However, our tests rely a lot on JDT, while platform UI tests only use JDT in one or two tests indirectly...

But that could be just coincidence, I will retrigger tests once again.

iloveeclipse avatar Oct 21 '24 17:10 iloveeclipse

But that could be just coincidence, I will retrigger tests once again.

See https://ci.eclipse.org/platform/job/eclipse.platform.ui/job/PR-2433/3/ No OOMs, but lot of test fails related to https://github.com/eclipse-platform/eclipse.platform/issues/1592.

iloveeclipse avatar Oct 21 '24 19:10 iloveeclipse

Hmm, interestingly there are OOM errors on https://github.com/eclipse-platform/eclipse.platform.ui/pull/2438.

iloveeclipse avatar Oct 21 '24 21:10 iloveeclipse

Neither test execution order nor bundle build time are fixed, this makes it almost impossible why the OOM happens on Jenkins IMO. My way to work towards this is to reduce tests to do the minimum setup needed and to reduce the usage of older versions of libs, hoping that at some point that would pay off not only in easier to understand tests but also with less stress on build machines.

akurtakov avatar Oct 22 '24 05:10 akurtakov

Neither test execution order nor bundle build time are fixed, this makes it almost impossible why the OOM happens on Jenkins IMO.

Not sure why do you think the test execution order is not fixed, it seem to be defined by the UiTestSuite, at least it is that what I observe in log.

My way to work towards this is to reduce tests to do the minimum setup needed and to reduce the usage of older versions of libs, hoping that at some point that would pay off not only in easier to understand tests but also with less stress on build machines.

I don't think this is applicable here, so far it looks like the OOM's appearing at the end of the suite, and so far all measurements printed didn't show any excessive memory use at all.

So either we have *something that hits memory at the test end (what???) or it is JVM that is lazy to call GC timely and crashes with OOM's just because GC has no free thread/CPU core to do the work. Later one would match the observation that we have lot of blocked threads and so also lot of fails due https://github.com/eclipse-platform/eclipse.platform/issues/1592. So far I saw no OOM's after adding explicit gc() calls on teardown on this PR.

If so, adding explicit gc() calls could stabilize test execution on such poor VM's we have.

iloveeclipse avatar Oct 22 '24 08:10 iloveeclipse

Neither test execution order nor bundle build time are fixed, this makes it almost impossible why the OOM happens on Jenkins IMO.

Not sure why do you think the test execution order is not fixed, it seem to be defined by the UiTestSuite, at least it is that what I observe in log.

"From version 4.11, JUnit will by default use a deterministic, but not predictable, order. " from https://github.com/junit-team/junit4/wiki/test-execution-order . So as long as nothing changes order stays the same but whenever there is some change that order can go totally different (in my experience).

akurtakov avatar Oct 22 '24 08:10 akurtakov

"From version 4.11, JUnit will by default use a deterministic, but not predictable, order. " from https://github.com/junit-team/junit4/wiki/test-execution-order . So as long as nothing changes order stays the same but whenever there is some change that order can go totally different (in my experience).

Sure, this is about test methods in the test class, I was talking about test classes.

iloveeclipse avatar Oct 22 '24 09:10 iloveeclipse

Still failing with OOMs, still till the end no sign of any memory issues. https://ci.eclipse.org/platform/job/eclipse.platform.ui/job/PR-2433/6/consoleFull

So it could be a memory spike on shutdown (why?) or in the tycho/surefire post-processing code.

@laeubi : were there any updates on surefire recently that could be related? I remember we had a memory leak in the past after surefire update, so maybe something similar happened again?

To make sure it has no relationship to failed tests because of not deleted files, will wait for the SDK build with the fix for https://github.com/eclipse-platform/eclipse.platform/pull/1593

iloveeclipse avatar Oct 22 '24 10:10 iloveeclipse

[ERROR] Failed to execute goal org.eclipse.tycho:tycho-surefire-plugin:4.0.9:test (default-test) on project org.eclipse.ui.tests: An unexpected error occurred while launching the test runtime (process returned error code 13). The process logfile /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work/data/.metadata/.log might contain further details. Command-line used to launch the sub-process was /opt/tools/java/openjdk/jdk-17/latest/bin/java -Dosgi.noShutdown=false -Dosgi.os=linux -Dosgi.ws=gtk -Dosgi.arch=x86_64 --add-modules=ALL-SYSTEM -Dosgi.clean=true -ea -jar /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/.m2/repository/p2/osgi/bundle/org.eclipse.equinox.launcher/1.6.900.v20240613-2009/org.eclipse.equinox.launcher-1.6.900.v20240613-2009.jar -data /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work/data -install /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work -configuration /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work/configuration -application org.eclipse.tycho.surefire.osgibooter.uitest -testproperties /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/surefire.properties in working directory /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests

We probably should simply increase heap size for tests from default (1/4 RAM == 1 GB) to at least 2 GB.

iloveeclipse avatar Oct 22 '24 16:10 iloveeclipse

This pull request changes some projects for the first time in this development cycle. Therefore the following files need a version increment:

tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF

An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch.

Git patch
From aff4de1f4fbc3c6dea0d2529ca310e09bb0bc117 Mon Sep 17 00:00:00 2001
From: Eclipse Platform Bot <[email protected]>
Date: Tue, 10 Dec 2024 08:39:51 +0000
Subject: [PATCH] Version bump(s) for 4.35 stream


diff --git a/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF b/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF
index ed65721730..259fee3e99 100644
--- a/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF
+++ b/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF
@@ -2,7 +2,7 @@ Manifest-Version: 1.0
 Bundle-ManifestVersion: 2
 Bundle-Name: Harness Plug-in
 Bundle-SymbolicName: org.eclipse.ui.tests.harness;singleton:=true
-Bundle-Version: 1.10.500.qualifier
+Bundle-Version: 1.10.600.qualifier
 Eclipse-BundleShape: dir
 Require-Bundle: org.eclipse.ui,
  org.eclipse.core.runtime,
-- 
2.47.1

Further information are available in Common Build Issues - Missing version increments.

eclipse-platform-bot avatar Dec 10 '24 08:12 eclipse-platform-bot