AssertionError: Docker container '...' contained extra running processes after test completed
https://travis-ci.org/dw/mitogen/jobs/653211748?utm_medium=notification&utm_source=github_status
Sometimes tests fail with:
======================================================================
ERROR: tearDownClass (doas_test.DoasTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/dw/mitogen/tests/testlib.py", line 534, in tearDownClass
cls.dockerized_ssh.check_processes()
File "/home/travis/build/dw/mitogen/tests/testlib.py", line 477, in check_processes
counts
AssertionError: Docker container 'mitogen-test-a84b6c7898599fd6' contained extra running processes after test completed: {u'doas <defunct>': 1, u'ps': 1, u'sshd': 1}
======================================================================
ERROR: tearDownClass (sudo_test.NonEnglishPromptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/dw/mitogen/tests/testlib.py", line 534, in tearDownClass
cls.dockerized_ssh.check_processes()
File "/home/travis/build/dw/mitogen/tests/testlib.py", line 477, in check_processes
counts
AssertionError: Docker container 'mitogen-test-b3754356ae2ff3e' contained extra running processes after test completed: {u'ps': 1, u'sshd': 1, u'sudo <defunct>': 1}
----------------------------------------------------------------------
Ran 670 tests in 103.324s
FAILED (errors=2, skipped=45)
discovered when testing https://github.com/dw/mitogen/pull/658 and https://github.com/dw/mitogen/pull/693
This test also sometimes has issues, maybe 4/10 test runs:
ERROR: tearDownClass (sudo_test.NonEnglishPromptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/dw/mitogen/tests/testlib.py", line 534, in tearDownClass
cls.dockerized_ssh.check_processes()
File "/home/travis/build/dw/mitogen/tests/testlib.py", line 477, in check_processes
counts
AssertionError: Docker container 'mitogen-test-e82c96aa0f304f6a' contained extra running processes after test completed: {u'ps': 1, u'sshd': 1, u'sudo <defunct>': 1}
https://travis-ci.org/dw/mitogen/jobs/653673605?utm_medium=notification&utm_source=github_status example test fail
This still occurs intermittently on Azure Devops. While experimenting with Github Actions I'm seeing it reliably fail as follows, in py36-mode_mitogen-distro_centos7
test_unknown (utils_test.CastTest) ... ok
test_run_with_broker (utils_test.RunWithRouterTest) ... ok
test_with_broker (utils_test.WithRouterTest) ... ok
test_with_broker_preserves_attributes (utils_test.WithRouterTest) ... ok
======================================================================
ERROR: tearDownClass (ssh_test.BannerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/mitogen/mitogen/tests/testlib.py", line 654, in tearDownClass
cls.dockerized_ssh.check_processes()
File "/home/runner/work/mitogen/mitogen/tests/testlib.py", line 592, in check_processes
processes,
AssertionError: Docker container 'mitogen-test-4fa107141314c30a' contained extra running processes after test completed: [('sshd', '/usr/sbin/sshd -D'), ('sshd', 'sshd: mitogen__has_sudo [priv]'), ('ps', 'ps -w -w -o ucomm= -o args=')]
----------------------------------------------------------------------
Ran 680 tests in 64.319s
FAILED (errors=1, skipped=66)
The following changes to testlib.py and ssh_test.py in isolation each made it go away, I haven't tested them in combination
@@ -609,6 +625,8 @@ def setUp(self):
self.router = self.router_class(self.broker)
def tearDown(self):
+ self.router.disconnect_all()
+ self.router.__exit__(None, None, None)
del self.router
super(RouterMixin, self).tearDown()
@@ -595,6 +614,7 @@ def tearDown(self):
self.broker.shutdown()
self.broker.join()
del self.broker
+ time.sleep(1.0)
super(BrokerMixin, self).tearDown()
def sync_with_broker(self):
@@ -190,6 +190,7 @@ def test_verbose_enabled(self):
self.dockerized_ssh.port,
)
self.assertEqual(name, context.name)
+ context.shutdown(wait=True)
class StubPermissionDeniedTest(StubSshMixin, testlib.TestCase):
My hypothesis is that the ssh connection is being left open by Mitogen, or it is relying on th GC to do cleanup, or we are closing the connection asynchronously and not waiting for it to complete.
The following changes to testlib.py and ssh_test.py in isolation each made it go away, I haven't tested them in combination
Further testing revealed this is false. They might reduce occurence, but they haven't eliminated it.
I'm starting to think this is a combination of
- Tests not closing their context with
context.shutdown() cls.dockerized_ssh.check_processes()inteardownClass()is in a race condition withsshdcleaning up after the client disconnects.
My guesstimate: After the TCP socket is closed (by the client or the server) sshd will take some non-zero amount of time to exit or terminate the session processes. Since there is no communcation channel between the client and server, there is not way for the test suite to wait for the server processes to be gone.
If so, then the least worst option to address item 2 is waiting a short period before calling check_processes(), and or retrying a small number of times.