eclipse.platform.swt icon indicating copy to clipboard operation
eclipse.platform.swt copied to clipboard

Multiple SWT/UI test failures on Windows since I20240923-0040 or I20240918-0950

Open iloveeclipse opened this issue 1 year ago • 11 comments
trafficstars

See https://download.eclipse.org/eclipse/downloads/drops4/I20240923-1800/testresults/html/org.eclipse.swt.tests_ep434I-unit-win32-java17_win32.win32.x86_64_17.html

Last known good state: I20240917-1800.

After that we didn't had any Windows test executed.

They started to run again with I20240923-0040, showing 38 test failures in SWT and many Jface & platform UI tests.

Restarting Windows test machine didn't help so far: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410

iloveeclipse avatar Sep 24 '24 06:09 iloveeclipse

From the recent results => https://download.eclipse.org/eclipse/downloads/drops4/I20240924-1810/testResults.php

We dont see any of these failures.

deepika-u avatar Sep 25 '24 04:09 deepika-u

We dont see any of these failures

Because we don't see Windows test results yet

iloveeclipse avatar Sep 25 '24 04:09 iloveeclipse

in https://download.eclipse.org/eclipse/downloads/drops4/I20240926-1800/testresults/consolelogs/ep434I-unit-win32-java17_win32.win32.x86_64_17_consolelog.txt first exception is

     [echo] Running org.eclipse.ui.tests.UiTestSuite. Result file: C:\Users\genie.releng\workspace\AutomatedTests\ep434I-unit-win32-java17/workarea/I20240926-1800/eclipse-testing/results/ep434I-unit-win32-java17_win32.win32.x86_64_17/org.eclipse.ui.tests.UiTestSuite.xml
     [echo] timout property: 7200000
     [echo] frameworkvmargs:  -Xms256m -Xmx2048m  -Djava.security.manager=allow
     [echo] vmargs: 
     [echo] extraVMargs: 
     [echo] frameworkperfargs: 
     [echo] crash loglocationarg (if any): ${loglocationarg}
     [echo] crash loglocation (if not default): 
     [java] Sep 27, 2024 1:46:17 AM org.apache.aries.spifly.BaseActivator log
     [java] INFO: Registered provider org.slf4j.simple.SimpleServiceProvider of service org.slf4j.spi.SLF4JServiceProvider in bundle slf4j.simple
     [java] WARNING: Annotation classes from the 'javax.inject' or 'javax.annotation' package found.
     [java] It is recommended to migrate to the corresponding replacements in the jakarta namespace.
     [java] The Eclipse E4 Platform will remove support for those javax-annotations in a future release.
     [java] To suppress this warning, set the VM property: -Declipse.e4.inject.javax.warning=false
     [java] To disable processing of 'javax' annotations entirely, set the VM property: -Declipse.e4.inject.javax.disabled=true
     [java] 
     [java] INFO: timeoutScreenOutputDir: C:\Users\genie.releng\workspace\AutomatedTests\ep434I-unit-win32-java17\workarea\I20240926-1800\eclipse-testing\results\ep434I-unit-win32-java17_win32.win32.x86_64_17/timeoutScreens
     [java] INFO: timeout: 7200000
     [java] categoryChanged
     [java] OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
     [java] Error while informing user about event loop exception:
     [java] org.eclipse.swt.SWTError: No more handles
     [java] 	at org.eclipse.swt.SWT.error(SWT.java:4948)
     [java] 	at org.eclipse.swt.SWT.error(SWT.java:4837)
     [java] 	at org.eclipse.swt.SWT.error(SWT.java:4808)
     [java] 	at org.eclipse.swt.widgets.Widget.error(Widget.java:500)
     [java] 	at org.eclipse.swt.widgets.Control.createHandle(Control.java:675)
     [java] 	at org.eclipse.swt.widgets.Scrollable.createHandle(Scrollable.java:146)
     [java] 	at org.eclipse.swt.widgets.Composite.createHandle(Composite.java:300)
     [java] 	at org.eclipse.swt.widgets.Decorations.createHandle(Decorations.java:366)
     [java] 	at org.eclipse.swt.widgets.Shell.createHandle(Shell.java:617)
     [java] 	at org.eclipse.swt.widgets.Control.createWidget(Control.java:701)
     [java] 	at org.eclipse.swt.widgets.Scrollable.createWidget(Scrollable.java:161)
     [java] 	at org.eclipse.swt.widgets.Decorations.createWidget(Decorations.java:375)
     [java] 	at org.eclipse.swt.widgets.Shell.<init>(Shell.java:313)
     [java] 	at org.eclipse.swt.widgets.Shell.<init>(Shell.java:392)
     [java] 	at org.eclipse.jface.window.Window.createShell(Window.java:487)
     [java] 	at org.eclipse.jface.window.Window.create(Window.java:430)
     [java] 	at org.eclipse.jface.dialogs.Dialog.create(Dialog.java:1092)
     [java] 	at org.eclipse.jface.window.Window.open(Window.java:788)
     [java] 	at org.eclipse.ui.internal.progress.BlockedJobsDialog$1.runInUIThread(BlockedJobsDialog.java:109)
     [java] 	at org.eclipse.ui.progress.UIJob.lambda$0(UIJob.java:148)
     [java] 	at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:40)
     [java] 	at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:132)
     [java] 	at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:4099)
     [java] 	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3715)
     [java] 	at org.eclipse.ui.internal.dialogs.EventLoopProgressMonitor.runEventLoop(EventLoopProgressMonitor.java:124)
     [java] 	at org.eclipse.ui.internal.dialogs.EventLoopProgressMonitor.isCanceled(EventLoopProgressMonitor.java:98)
     [java] 	at org.eclipse.core.internal.jobs.ThreadJob.isCanceled(ThreadJob.java:154)
     [java] 	at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:286)
     [java] 	at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:214)
     [java] 	at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:95)
     [java] 	at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:343)
     [java] 	at org.eclipse.ui.tests.concurrency.NoFreezeWhileWaitingForRuleTest.lambda$1(NoFreezeWhileWaitingForRuleTest.java:146)
     [java] 	at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:40)
     [java] 	at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:132)
     [java] 	at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:4099)
     [java] 	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3715)
     [java] 	at org.eclipse.ui.tests.concurrency.NoFreezeWhileWaitingForRuleTest.testWaiting(NoFreezeWhileWaitingForRuleTest.java:88)

the same occured in https://download.eclipse.org/eclipse/downloads/drops4/I20240926-0020/testresults/consolelogs/ep434I-unit-win32-java17_win32.win32.x86_64_17_consolelog.txt

locally for me NoFreezeWhileWaitingForRuleTest runs fine.

jukzi avatar Sep 27 '24 06:09 jukzi

I've reopened https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410. I assume something is wrong with the Azure Windows test instance.

iloveeclipse avatar Sep 30 '24 05:09 iloveeclipse

In SWT & JFace tests I see following error reported multiple times:

Error: Could not find or load main class org.eclipse.test.AwtScreenshot
Caused by: java.lang.ClassNotFoundException: org.eclipse.test.AwtScreenshot
AwtScreenshot VM finished with exit code 1.

See

  • https://download.eclipse.org/eclipse/downloads/drops4/I20241006-1800/testresults/ep434I-unit-win32-java17_win32.win32.x86_64_17/org.eclipse.swt.tests.junit.AllNonBrowserTests.txt
  • https://download.eclipse.org/eclipse/downloads/drops4/I20241006-1800/testresults/ep434I-unit-win32-java17_win32.win32.x86_64_17/org.eclipse.jface.text.tests.JFaceTextTestSuite.txt

Could it be, it is our (SDK) issue and not Windows machine problem?

iloveeclipse avatar Oct 07 '24 06:10 iloveeclipse

Could it be, it is our (SDK) issue and not Windows machine problem?

Is there a full stack trace available?

laeubi avatar Oct 07 '24 07:10 laeubi

No

iloveeclipse avatar Oct 07 '24 07:10 iloveeclipse

Error: Could not find or load main class org.eclipse.test.AwtScreenshot

that split package was recently moved from perfromance to eclipse.test @akurtakov

jukzi avatar Oct 07 '24 07:10 jukzi

There was only one o.e.test.AwtScreenshot (in o.e.test bundle) and it gets only additions after the split package removal https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/commit/6e6a13637c9fe21c3226f34193b73d35fb3bf01b#diff-4234e8d1a72f1bd53d546bc92ce417bac8b765082c1728d2a9356a74ef507c7e . I fail to see a relation for now.

akurtakov avatar Oct 07 '24 07:10 akurtakov

org.eclipse.test.performance had bundle shape jar, while org.eclipse.test has Eclipse-BundleShape: dir and may need the same fix as https://github.com/eclipse-jdt/eclipse.jdt.debug/commit/11d1f9910f016041a13ec2cdfaabbd706fc9827f

jukzi avatar Oct 07 '24 08:10 jukzi

Error: Could not find or load main class org.eclipse.test.AwtScreenshot

was unrelated to the many fails- now its gone:

AWT screenshot saved to: C:\Users\genie.releng\workspace\AutomatedTests\ep434I-unit-win32-java17\workarea\I20241007-1800\eclipse-testing\results\ep434I-unit-win32-java17_win32.win32.x86_64_17\org.eclipse.swt.tests.junit.Test_org_eclipse_swt_widgets_Tree.test_Virtual.png https://download.eclipse.org/eclipse/downloads/drops4/I20241007-1800/testresults/ep434I-unit-win32-java17_win32.win32.x86_64_17/org.eclipse.swt.tests.junit.AllNonBrowserTests.txt

The screenshots of the failed tests are now available for download: image https://download.eclipse.org/eclipse/downloads/drops4/I20241007-1800/logs.php#console

however they seem to be not helpfull (just black screen)

jukzi avatar Oct 08 '24 06:10 jukzi

Black screen is a symptom that someone has logged in to the host via RDP and then disconnected normally. Ask to reboot the host and not to touch RDP. Also, disable Screensaver, power saving, auto lock, enable auto login on boot, ensure to run build system agent as a logged in user (not as as a system service) . But those were probably already done.

I don't have time to look for a better source, so here is relevant instructional article for a random product.

https://docs.testarchitect.com/user-guide/support/frequently-asked-questions/disconnecting-from-remote-desktop-while-executing-automated-tests/ https://superuser.com/q/80334/28322

Note, that locked out GUI sessions skip (do not emit) a few types of system events like Paint, so tests done on "black screen" are invalid.

basilevs avatar Oct 08 '24 19:10 basilevs

Could it be, it is our (SDK) issue and not Windows machine problem?

That would mean that there was a causing change in the period of time in which the failures first occured (this is, after 17th September). Some test failures already occur in SWT tests (i.e., at the beginning of the dependency chain). In SWT itself, there was only a single change in the period of time in which tests started to fail, and it seems unlikely that this change has caused the problems. I am not sure if changes to the other projects/bundles may also affect the SWT tests in I-Builds, since even the SWT tests are run in an Equinox environment against all the SDK dependencies, aren't they?

in https://download.eclipse.org/eclipse/downloads/drops4/I20240926-1800/testresults/consolelogs/ep434I-unit-win32-java17_win32.win32.x86_64_17_consolelog.txt first exception is

     ...
     [java] Error while informing user about event loop exception:
     [java] org.eclipse.swt.SWTError: No more handles
     [java] 	at org.eclipse.swt.SWT.error(SWT.java:4948)

While those errors appeared together with the discussed test failures and thus there will probably be some relation, they do not seem to be an indicator for a root cause. Those error appear in the logs after several of the failing tests (in particular the SWT tests) have already been executed. The SWT tests run without any of those "no more handles" errors.

With respect to a potential cause in the infrastructure, no further actions seem to be taken according to the helpdesk issue: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410, e.g., in response to the comments of Vasili (https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410#note_2832799) or Jörg (https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410#note_2829466).

In order to isolate the root cause (in particular SDK vs. infrastructure): would it be possible (with low effort) to temporarily (for testing purposes) move the Windows job on a different node? E.g., there is still the Windows 10 node (https://ci.eclipse.org/releng/computer/rs68g%2Dwin10/).

HeikoKlare avatar Oct 17 '24 07:10 HeikoKlare

created https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5181

jukzi avatar Oct 18 '24 09:10 jukzi

Hurra, we don't have Windows test failures anymore... because tests don't run anymore on Windows... see https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/issues/2468

iloveeclipse avatar Oct 21 '24 19:10 iloveeclipse

Oh dear. That is one way to address the problem.

merks avatar Oct 21 '24 19:10 merks

I adressed the problem with the failing ui tests with frederric Gurr in person. They promised infrastructur Team will help to work on identify the root problem - if we can configure the failing Job to fail much faster, for example by executing the failing test only. I however do only know how we could change the TestSuite but not know how to disable the suits of the other bundles. Ideas needed. Is there a Parameter to only test swt or Platform.ui during the job?

Jörg Kubitz

Am 21.10.2024 um 21:07 schrieb Andrey Loskutov @.***>:

 Hurra, we don't have Windows test failures anymore... because tests don't run anymore on Windows... see eclipse-platform/eclipse.platform.releng.aggregator#2468

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

jukzi avatar Oct 22 '24 13:10 jukzi

for example by executing the failing test only.

I suggest to create a dedicated test that would fail only if no PAINT events are happening and run that in a loop. Verify that it fails on Windows, when the console running the test is locked.

checkswtpaintevent.zip are sources for a simple Eclipse application that opens and repeatedly resizes a window to receive paint events from OS. It exits with a non-zero exit code when paint events are not received as expected. So far I was unable to reproduce the missing events using Windows 11 VM on aarch64 on locked screen and on Windows 11 VM x86_64 by disconnecting RDP. I assume that resize is not a complex enough operation for Windows to optimize-out without graphical context. Further effort is required.

Prebuilt: checkswtpaintevent_dist.zip (6Mb) Start with java --add-modules=ALL-SYSTEM -jar .\eclipse\plugins\org.eclipse.equinox.launcher_1.6.900.v20240613-2009.jar -consoleLog

basilevs avatar Oct 22 '24 13:10 basilevs

Apart from recent consistent 38 failures on windows, there is a new test failure for the test case test_OpenWindowListener_open_ChildPopup but its a case of timeout. -> https://download.eclipse.org/eclipse/downloads/drops4/I20241024-1800/testresults/html/org.eclipse.swt.tests_ep434I-unit-win32-java17_win32.win32.x86_64_17.html#AllBrowserTests

Test timed out.

java.lang.AssertionError:
Test timed out.
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.test_OpenWindowListener_open_ChildPopup(Test_org_eclipse_swt_browser_Browser.java:814)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at org.eclipse.test.TracingSuite.runChild(TracingSuite.java:287) 

I think this if reattempted/rerun should pass is my understanding. Am i correct?

deepika-u avatar Oct 25 '24 05:10 deepika-u

I however do only know how we could change the TestSuite but not know how to disable the suits of the other bundles. Ideas needed. Is there a Parameter to only test swt or Platform.ui during the job?

One can replay the latest I-build test job run for Windows, i.e. https://ci.eclipse.org/releng/job/AutomatedTests/job/ep434I-unit-win32-java17/ and set the testSuite from all to ui, i.e. to -DtestSuite=ui. With that set, only org.eclipse.ui.tests is executed: https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/blob/2a86e91f8756bfe0583f8d6362c4773a02efbc16/production/testScripts/configuration/sdk.tests/testScripts/test.xml#L2202-L2206

If one does that ideally the publication of the test results is skipped to not remove all other results from the overview page of the previous build (i.e. delete the call of Releng/ep-collectResults).

I did all this for https://ci.eclipse.org/releng/job/AutomatedTests/job/ep434I-unit-win32-java17/84/, lets see how long this runs. If that's too long we have to reduce the content of that test-plugin but I don't think that easily possible only on a branch.

HannesWell avatar Oct 26 '24 23:10 HannesWell

Michael Keppler @Bananeweizen do you mind to help?

https://ci.eclipse.org/releng/job/AutomatedTests/job/ep434I-unit-win32-java17/84/ took 2 hours.

jukzi avatar Oct 28 '24 08:10 jukzi

As pointed by @jukzi , the problem we observe on Windows is most likely related to https://bugs.openjdk.org/browse/JDK-8336862.

So we should downgrade JDK on Windows to 17.0.11 version to get Windows tests working again???

iloveeclipse avatar Oct 28 '24 09:10 iloveeclipse

Since we are somehow stuck it may be worth a try. i will create a gitlab

jukzi avatar Oct 28 '24 09:10 jukzi

@iloveeclipse JDK-8336862 only takes place if Jenkins agent is run as system service and there is no active console present (read - auto login is not configured and/or RDP is used). Surely, this can't be the case on test machines? Because all kinds of problems will appear even on older Java if test host is misconfigured.

SWT tests have to run in an environment equivalent to a usual desktop user. No system services, no strange disconnected displays. Otherwise test results are not representative.

basilevs avatar Oct 28 '24 09:10 basilevs

Surely, this can't be the case on test machines?

I have not a slightest idea how Windows test machines are configured and how jenkins is started, sorry. Better discuss that with our IT guys on gitlab ticket.

iloveeclipse avatar Oct 28 '24 09:10 iloveeclipse

I created https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5213 @basilevs it sound like you have ideas how the VM should be configured. Would you mind to make a chat with Frederic Gurr? he would share a remote desktop with us if that helps.

jukzi avatar Oct 28 '24 09:10 jukzi

I created https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5213 @basilevs it sound like you have ideas how the VM should be configured. Would you mind to make a chat with Frederic Gurr? he would share a remote desktop with us if that helps.

I've left instructions in https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410#note_2832799 (2 weeks ago)

I'm open to chat. But if those instructions are already applied, I have no idea what's going on either. And last time I've configured a Jenkins agent personally was on Windows 7 (I've been blessed with nice IT support since), so this configuration would require a lot of searching.

basilevs avatar Oct 28 '24 09:10 basilevs

I'm open to chat. But if those instructions are already applied, I have no idea what's going on either.

please double check it. https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5214

jukzi avatar Oct 28 '24 09:10 jukzi

I'm not sure if anyone involved in this lengthy thread dealt with this yet:

org.eclipse.swt.SWTError: No more handles

That's an error thrown by Windows if a process has acquired 10.000 resource handles. This can happen easily when creating icons, mouse cursors etc. and not freeing their related OS resource handles. When dealing with such an error locally, I typically run https://the-sz.com/products/bear/ in parallel to the test execution and check which kind of resource handle is growing and how the lost handles look like. That's often sufficient to identify some piece of code for debugging (e.g. in our company product we had hundreds of unreleased "close" icons when I did this last).

I didn't ever have to use this with an automated pipeline, therefore I'm not sure what would be a good setup for doing this in parallel to the automated eclipse windows tests.

Bananeweizen avatar Oct 28 '24 13:10 Bananeweizen

No more handles

see https://github.com/eclipse-platform/eclipse.platform.ui/issues/2379 I guess that is only follow up error from cleaning up resources after failing test. But it may be also the other way around. the problem is i can not reproduce any of the tests failures locally.

jukzi avatar Oct 28 '24 13:10 jukzi