Browser tests: reduce timeout for instantiation to reasonable value
The timeout for browser instantiation in the browser tests has been increased to avoid flaky tests (via #1677). However, browser instantiation taking more than 10 seconds (the previous timeout value) is an indicator for something being wrong with the instantiation. There is no reason why instantiation may take that long during test execution but the same issue should never arise in productive use. Thus, such a case should be captured by tests and, if necessary, be fixed productively.
In order to do so, this change reduces the test timeout value again to a more reasonable value.
Related to https://github.com/eclipse-platform/eclipse.platform.swt/issues/1676
Also see https://github.com/eclipse-platform/eclipse.platform.swt/pull/1677#issuecomment-2567437084
Test Results
545 files + 480 545 suites +480 35m 18s ⏱️ + 35m 17s 4 374 tests + 4 353 4 356 ✅ + 4 335 18 💤 + 18 0 ❌ ±0 16 638 runs +16 533 16 498 ✅ +16 393 140 💤 +140 0 ❌ ±0
Results for commit 05b9b28c. ± Comparison against base commit 8c937588.
:recycle: This comment has been updated with latest results.
Builds sporadically show timeouts, e.g.:
Error: Tests run: 380, Failures: 1, Errors: 0, Skipped: 12, Time elapsed: 193.3 s <<< FAILURE! -- in org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser
Error: org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.test_BrowserFunction_callback[browser flags: 0] -- Time elapsed: 66.79 s <<< FAILURE!
java.lang.AssertionError: creating browser took too long: 65267ms
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.createBrowser(Test_org_eclipse_swt_browser_Browser.java:312)
at org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.setUp(Test_org_eclipse_swt_browser_Browser.java:185)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
Might be related to https://github.com/eclipse-platform/eclipse.platform.ui/issues/2734
I rebased on master and solved the conflicts. Let's see if there are still (too many) test failures.
@HeikoKlare according to your comment (https://github.com/eclipse-platform/eclipse.platform.swt/pull/1752#issuecomment-2623933890) this PR might need to be postponed until #1752 is merged, correct?
This PR should actually not need any other PR at all. The failing tests indicate that there is a bug in either the browser implementation or the test setup that are not aware of yet, so something we need to find out independent from any other PR (as I already mentioned in https://github.com/eclipse-platform/eclipse.platform.swt/pull/1677#issuecomment-2567437084).
What I mean is that "if we merge this PR after the ones that are already open then it is to be expected that there are no/less test failures due to timeouts" . But I see that you want to use this PR as a basis to check if the other PRs actually help (reduce/get rid of the test failures because of timeouts and freezes).
So, shall we merge this one then?
"if we merge this PR after the ones that are already open then it is to be expected that there are no/less test failures due to timeouts"
As we know, that's not the case. We still need to find the issue for the browser instantiation in tests taking so long. All the fixes we currently make regarding timeouts are only supposed to avoid that the actual errors (which we may not know yet) do not lead to a blocking UI. They do not properly solve the underlying issue.
So, shall we merge this one then?
I am not sure if people will be happy about that. Most Windows builds will fail with test failures then. So I wouldn't do that but rather find out the reasons for the failure first.
Note: I just confirmed that it's always about the first browser testing being executed, no matter which test that is. I added an ignore to the two tests that are usually executed first and then the new first executed test fails.
Additional note: first test failing even happens when making the Tycho build on GH Actions only execute AllBrowserTests. So it is not about any leftover from teste executed before the browser tests.
Note: the thread dump written after 60 seconds of test execution time show that the usually the main thread is stuck in display,sleep() / OS.WaitMessage() while waiting for browser instantiation:
[2025-02-01 17:15:07 +0000] org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.test_BrowserFunction_multiprocess[browser flags: 0]() ran for more than 60 seconds
totalMemory: 146800640
freeMemory (before GC): 96621696
freeMemory (after GC): 58666000
"main" prio=5 Id=1 RUNNABLE (in native)
at app//org.eclipse.swt.internal.win32.OS.WaitMessage(Native Method)
at app//org.eclipse.swt.widgets.Display.sleep(Display.java:4803)
at app//org.eclipse.swt.browser.Edge.processOSMessagesUntil(Edge.java:471)
at app//org.eclipse.swt.browser.Edge$WebViewProvider.getWebViewWrapper(Edge.java:370)
at app//org.eclipse.swt.browser.Edge$WebViewProvider.getWebView(Edge.java:381)
at app//org.eclipse.swt.browser.Edge.getUrl(Edge.java:894)
at app//org.eclipse.swt.browser.Browser.getUrl(Browser.java:771)
at app//org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.createBrowser(Test_org_eclipse_swt_browser_Browser.java:308)
One might think that there is some wake message, but the actual issue is that the OS message triggering the WebView instantiation callback did actually not arrive. When adding an explicit timed, repeated awakening, the display still goes into sleep repeatedly since no message arrived from the OS so far. In addition, adding debug output in the loop waiting for the OS message shows that the loop is spinned quite often.
Further finding: Once might also think that there is some message scheduled asynchronously on the display, which is not processed by the processOSMessagesUntil() (as it only processed OS messages) but may need to be processed (for whatever reason) to proceed with the processing the queue. In that case, whoever, display.sleep() would always returning immediately, as the display's synchronizer queue is not empty. Debug output also shows that this is usually not the case: the display properly goes to sleep in the scenarios where browser instantiation takes long, so there is actually nothing in the event queue (neither from the OS nor scheduled at the display by the application).
So the underlying issue is that the OS does not call the WebView instantiation callback for a long time (> 60 seconds). It is also interesting that usually after 65 seconds the callback is finally executed.
I've also been looking at it and all I came up with is:
- It's true that the problem is the callback is simply being called after ~65s (the delay I've seen is pretty consistent, always between 65.2s and 65.7s).
- I've seen the callback take the ~65s delay "only" 1-2 times per CI-run (and if it does it twice, the 2nd time is usually in the next 30 seconds after the 1st one).
Regarding point nr. 1, I used this commit to log the request and the callback and I saw this output:
1st delay: 65.237.957.400 ns
1356349440500 - ICoreWebView2Environment.CreateCoreWebView2Controller()
...
1421587397900 - Edge.createControllerInitializationCallback()
**2nd delay: 65.262.087.700 ns **
1423716947100 - ICoreWebView2Environment.CreateCoreWebView2Controller()
...
1488979034800 - Edge.createControllerInitializationCallback()
A quick internet search revealed that Instantiating Edge can be slow, but what I still don't get is:
- Why is it sometimes slow and not all the time ?
- Why is the delay consistently something ~65s ?
@HeikoKlare let's discuss how to proceed with this PR in today's daily.
FYI there is a 2nd commit in this PR, one is from https://github.com/eclipse-platform/eclipse.platform.swt/pull/1797
Yes, that was intended for testing purposes. We still see issues with WebView intialization sometimes taking long, so this is not ready for being merged anyway. I thus changed it back to "draft".
Do you plan to finish/submit this one?
Unfortunately, we did not find any way to ensure that WebView2 instantiation on Windows always works in a timely manner, especially in test setups. We came across some issue reports that indiciate that this seems to be a common issue with WebView2. Since we did not face any issue with non-timely instantiation of WebView2 in production so far, I don't see any real need and any way forward here. Thus closing it.