tycho icon indicating copy to clipboard operation
tycho copied to clipboard

Surefire seem to leak one file descriptor per test

Open iloveeclipse opened this issue 2 years ago • 18 comments

Follow up on https://github.com/eclipse-platform/eclipse.platform.swt/issues/108.

I've merged https://github.com/eclipse-platform/eclipse.platform.swt/pull/154 now that beside SWT test fix shows leaked file descriptors per test.

With that I see that the test framework (tycho) itself leaks file descriptors, one for each test, because I don't see original leakage locally in the IDE.

The problem with file descriptors leak is that on systems with limited resources (virtual machines, docker instances etc) the process may be unable to open new files or start new processes anymore because it hits the limit.

The descriptors are counted between setup & teardown, means, they leak after test code is done, inside test framework, here is the example output:

### Descriptors opened BEFORE test_setUrlWithNullArg: 625
### Descriptors opened AFTER test_setUrlWithNullArg: 625
[2022-05-18 13:56:10 +0000] org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser#test_setUrl_local()
Running Test_org_eclipse_swt_browser_Browser#test_setUrl_local

### Descriptors opened BEFORE test_setUrl_local: 626

### Descriptors opened AFTER test_setUrl_local: 626

I assume "stdout" redirection is broken, because we see following descriptors growing (example showing "global" leaks only):

Delta to first test: 100
	/tmp/stdout10025304183554339060deferred
	/tmp/stdout10101648864789820786deferred
	/tmp/stdout104424146349535352deferred
	/tmp/stdout10469890275235030494deferred
	/tmp/stdout10757497078713430677deferred
	/tmp/stdout10768413038937016760deferred

... much more from that crap...

iloveeclipse avatar May 18 '22 14:05 iloveeclipse

We should compare that with plain Maven to get a sense of where in the stack are those descriptors "leaked". And we should also consider the whole execution of surefire, because if those file handles are used and released later, it's not so much of a leak (it may not be optimal, but maybe not an issue either....)

mickaelistria avatar May 18 '22 14:05 mickaelistria

The code that counts opened descriptors for process is here (works on Linux only): https://github.com/eclipse-platform/eclipse.platform.swt/blob/d09c0a7bd4c2cd3feb018a997826b87d3bc6ed81/tests/org.eclipse.swt.tests/JUnit%20Tests/org/eclipse/swt/tests/junit/Test_org_eclipse_swt_browser_Browser.java#L2491

It simply goes over all entries in /proc/self/fd/ directory, counts them on setup/teardown & compares counts between tests. We see in SWT that every test adds one extra entry after teardown but before setup if executed with tycho, but no leaks reported if executed test locally from IDE.

iloveeclipse avatar May 18 '22 14:05 iloveeclipse

For the record swt tests are run with plain maven-surefire (https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L43) and tycho-surefire is disabled (https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L35) as these tests are plain old junit tests and expect to not run in osgi container. Thus I don't think Tycho plays a role here.

akurtakov avatar May 18 '22 16:05 akurtakov

For the record swt tests are run with plain maven-surefire

And actually, Tycho is not involved in the "official" test suite execution as far as I know.

mickaelistria avatar May 18 '22 16:05 mickaelistria

For the record swt tests are run with plain maven-surefire

And actually, Tycho is not involved in the "official" test suite execution as far as I know.

Absolutely!

akurtakov avatar May 18 '22 16:05 akurtakov

It might be a good idea then to try newer surefire versions.

laeubi avatar May 18 '22 16:05 laeubi

And actually, Tycho is not involved in the "official" test suite execution as far as I know.

Leak is seen in Jenkins build, that is using what? https://ci.eclipse.org/platform/job/eclipse.platform.swt/view/change-requests/

iloveeclipse avatar May 18 '22 16:05 iloveeclipse

It might be a good idea then to try newer surefire versions.

Build already uses 3.0.0-M6 and I'm not aware of any new version. @iloveeclipse would you please try removing https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L51 ? We might be leaking in this "test/verification" code (data fetched by g_object_get_qdata might need freeing, just gut feeling ).

akurtakov avatar May 18 '22 16:05 akurtakov

And actually, Tycho is not involved in the "official" test suite execution as far as I know.

Leak is seen in Jenkins build, that is using what? https://ci.eclipse.org/platform/job/eclipse.platform.swt/view/change-requests/

Maven Surefire 3.0.0-M6 is running the tests.

akurtakov avatar May 18 '22 16:05 akurtakov

If i remember right @mickaelistria has opened a ticket about leaking in M6 and thus we have reverted that version in tycho, so maybe my guess was wrong and we should instead use an older version instead here? see

  • https://github.com/eclipse/tycho/pull/890
  • https://github.com/eclipse/tycho/issues/879

laeubi avatar May 18 '22 16:05 laeubi

would you please try removing https://github.com/eclipse-platform/eclipse.platform.swt/blob/master/tests/org.eclipse.swt.tests/pom.xml#L51 ? We might be leaking in this "test/verification" code (data fetched by g_object_get_qdata might need freeing, just gut feeling ).

I've tried locally with the flag enabled, it is irrelevant here.

Just look at the leaked file names: /tmp/stdout10768413038937016760deferred they are pointing on the test framework that most likely tries to redirect std out to a file and forgets to close the file after test (most likely just adding "delete on exit" flag).

Note: in SWT browser tests we do print to std out for every test.

iloveeclipse avatar May 18 '22 16:05 iloveeclipse

Looking on the diff in https://issues.apache.org/jira/browse/SUREFIRE-1845 / https://github.com/apache/maven-surefire/commit/32bd56b4ea908147592ef92c71c4e7936e070993#diff-c634b539151b4b76bde50d93d5a6cd2b0f51aa411281398cc295cc721bdfa1d5L500 I see call to utf8RecodingDeferredFileOutputStream.close(); removed, and that matches nicely the temp file names /tmp/stdout10768413038937016760deferred

iloveeclipse avatar May 18 '22 16:05 iloveeclipse

@iloveeclipse Did you raise a bug in surefire? I didn't find any that relates to your finding. Your mention of the removed line in StatelessXmlReporter.java looks reasonable to me.

Bananeweizen avatar Jun 06 '22 19:06 Bananeweizen

No, I haven't, no time. Feel free to open one.

iloveeclipse avatar Jun 06 '22 19:06 iloveeclipse

I did open https://issues.apache.org/jira/browse/SUREFIRE-2072 almost 2 months ago.

mickaelistria avatar Jun 06 '22 19:06 mickaelistria

I did open https://issues.apache.org/jira/browse/SUREFIRE-2072 almost 2 months ago.

But that is a different issue (about OOM). Here we leak file descriptors. Or the issue needs an update and extra point about not only leaking memory but also file resources.

iloveeclipse avatar Jun 07 '22 16:06 iloveeclipse

But that is a different issue (about OOM). Here we leak file descriptors.

OK.

mickaelistria avatar Jun 07 '22 20:06 mickaelistria

I see @Bananeweizen opened https://issues.apache.org/jira/browse/SUREFIRE-2094. Thanks. Interestingly, it was already reported, I've linked to the duplicate.

iloveeclipse avatar Jun 08 '22 20:06 iloveeclipse

We have now upgraded to the latest surefire, and as Tycho can not really do much about this if there are issues remaining please report them to the surefire project and ask for a release.

laeubi avatar Jan 29 '23 06:01 laeubi