tycho
tycho copied to clipboard
Surefire seem to leak one file descriptor per test
Follow up on https://github.com/eclipse-platform/eclipse.platform.swt/issues/108.
I've merged https://github.com/eclipse-platform/eclipse.platform.swt/pull/154 now that beside SWT test fix shows leaked file descriptors per test.
With that I see that the test framework (tycho) itself leaks file descriptors, one for each test, because I don't see original leakage locally in the IDE.
The problem with file descriptors leak is that on systems with limited resources (virtual machines, docker instances etc) the process may be unable to open new files or start new processes anymore because it hits the limit.
The descriptors are counted between setup & teardown, means, they leak after test code is done, inside test framework, here is the example output:
### Descriptors opened BEFORE test_setUrlWithNullArg: 625
### Descriptors opened AFTER test_setUrlWithNullArg: 625
[2022-05-18 13:56:10 +0000] org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser#test_setUrl_local()
Running Test_org_eclipse_swt_browser_Browser#test_setUrl_local
### Descriptors opened BEFORE test_setUrl_local: 626
### Descriptors opened AFTER test_setUrl_local: 626
I assume "stdout" redirection is broken, because we see following descriptors growing (example showing "global" leaks only):
Delta to first test: 100
/tmp/stdout10025304183554339060deferred
/tmp/stdout10101648864789820786deferred
/tmp/stdout104424146349535352deferred
/tmp/stdout10469890275235030494deferred
/tmp/stdout10757497078713430677deferred
/tmp/stdout10768413038937016760deferred
... much more from that crap...
We should compare that with plain Maven to get a sense of where in the stack are those descriptors "leaked". And we should also consider the whole execution of surefire, because if those file handles are used and released later, it's not so much of a leak (it may not be optimal, but maybe not an issue either....)
The code that counts opened descriptors for process is here (works on Linux only): https://github.com/eclipse-platform/eclipse.platform.swt/blob/d09c0a7bd4c2cd3feb018a997826b87d3bc6ed81/tests/org.eclipse.swt.tests/JUnit%20Tests/org/eclipse/swt/tests/junit/Test_org_eclipse_swt_browser_Browser.java#L2491
It simply goes over all entries in /proc/self/fd/
directory, counts them on setup/teardown & compares counts between tests.
We see in SWT that every test adds one extra entry after teardown but before setup if executed with tycho, but no leaks reported if executed test locally from IDE.
For the record swt tests are run with plain maven-surefire (https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L43) and tycho-surefire is disabled (https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L35) as these tests are plain old junit tests and expect to not run in osgi container. Thus I don't think Tycho plays a role here.
For the record swt tests are run with plain maven-surefire
And actually, Tycho is not involved in the "official" test suite execution as far as I know.
For the record swt tests are run with plain maven-surefire
And actually, Tycho is not involved in the "official" test suite execution as far as I know.
Absolutely!
It might be a good idea then to try newer surefire versions.
And actually, Tycho is not involved in the "official" test suite execution as far as I know.
Leak is seen in Jenkins build, that is using what? https://ci.eclipse.org/platform/job/eclipse.platform.swt/view/change-requests/
It might be a good idea then to try newer surefire versions.
Build already uses 3.0.0-M6 and I'm not aware of any new version. @iloveeclipse would you please try removing https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L51 ? We might be leaking in this "test/verification" code (data fetched by g_object_get_qdata might need freeing, just gut feeling ).
And actually, Tycho is not involved in the "official" test suite execution as far as I know.
Leak is seen in Jenkins build, that is using what? https://ci.eclipse.org/platform/job/eclipse.platform.swt/view/change-requests/
Maven Surefire 3.0.0-M6 is running the tests.
If i remember right @mickaelistria has opened a ticket about leaking in M6 and thus we have reverted that version in tycho, so maybe my guess was wrong and we should instead use an older version instead here? see
- https://github.com/eclipse/tycho/pull/890
- https://github.com/eclipse/tycho/issues/879
would you please try removing https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L51 ? We might be leaking in this "test/verification" code (data fetched by g_object_get_qdata might need freeing, just gut feeling ).
I've tried locally with the flag enabled, it is irrelevant here.
Just look at the leaked file names: /tmp/stdout10768413038937016760deferred
they are pointing on the test framework that most likely tries to redirect std out to a file and forgets to close the file after test (most likely just adding "delete on exit" flag).
Note: in SWT browser tests we do print to std out for every test.
Looking on the diff in https://issues.apache.org/jira/browse/SUREFIRE-1845 / https://github.com/apache/maven-surefire/commit/32bd56b4ea908147592ef92c71c4e7936e070993#diff-c634b539151b4b76bde50d93d5a6cd2b0f51aa411281398cc295cc721bdfa1d5L500 I see call to utf8RecodingDeferredFileOutputStream.close();
removed, and that matches nicely the temp file names /tmp/stdout10768413038937016760deferred
@iloveeclipse Did you raise a bug in surefire? I didn't find any that relates to your finding. Your mention of the removed line in StatelessXmlReporter.java looks reasonable to me.
No, I haven't, no time. Feel free to open one.
I did open https://issues.apache.org/jira/browse/SUREFIRE-2072 almost 2 months ago.
I did open https://issues.apache.org/jira/browse/SUREFIRE-2072 almost 2 months ago.
But that is a different issue (about OOM). Here we leak file descriptors. Or the issue needs an update and extra point about not only leaking memory but also file resources.
But that is a different issue (about OOM). Here we leak file descriptors.
OK.
I see @Bananeweizen opened https://issues.apache.org/jira/browse/SUREFIRE-2094. Thanks. Interestingly, it was already reported, I've linked to the duplicate.
We have now upgraded to the latest surefire, and as Tycho can not really do much about this if there are issues remaining please report them to the surefire project and ask for a release.