trio-websocket tests have flakey coverage?

e.g. coverage tool reports regression on unrelated change:

https://coveralls.io/builds/20028763/source?filename=trio_websocket%2F_impl.py#L842

it's the except clause in _write_pending

if there are timing issues in the tests, that could explain intermittent errors like #68

Nov 10 '18 22:11 belm0

Thank you for pointing this out, although I'm not sure how to interpret the coverage report.

screenshot from 2018-11-12 12-15-26

It says "uncov" on the side but the lines are green?

Anyway, you are right that the coverage of these lines depends on a race. Here's a build where the 3.6 tests covered these lines but the 3.7 lines did not. There are other builds where both 3.6 and 3.7 cover.

Nov 12 '18 18:11 mehaase

Here's an example where test_server_does_not_close_handshake coverage is flakey. The first time it covers the lines in question (842-846), then the next time it does not.

(venv) mhaase@MEH-MBP:/V/C/trio-websocket:master$
rm .coverage; and pytest -k test_server_does_not_close_handshake --cov=trio_websocket; and coverage report -m
=============== test session starts ===============
platform darwin -- Python 3.7.0, pytest-3.10.0, py-1.7.0, pluggy-0.8.0
rootdir: /Volumes/Case_Sensitive/trio-websocket, inifile: pytest.ini
plugins: trio-0.5.1, cov-2.6.0
collected 31 items / 30 deselected

tests/test_connection.py .                                                                                   [100%]

---------- coverage: platform darwin, python 3.7.0-final-0 -----------
Name                         Stmts   Miss  Cover
------------------------------------------------
trio_websocket/__init__.py       1      0   100%
trio_websocket/_impl.py        365    160    56%
trio_websocket/version.py        1      0   100%
------------------------------------------------
TOTAL                          367    160    56%


=============== 1 passed, 30 deselected in 0.24 seconds ===============
Name                         Stmts   Miss  Cover   Missing
----------------------------------------------------------
trio_websocket/__init__.py       1      0   100%
trio_websocket/_impl.py        365    160    56%   [SNIP] 808-809, 842-846, 852-854 [SNIP]
trio_websocket/version.py        1      0   100%
----------------------------------------------------------
TOTAL                          367    160    56%
(venv) mhaase@MEH-MBP:/V/C/trio-websocket:master$
rm .coverage; and pytest -k test_server_does_not_close_handshake --cov=trio_websocket; and coverage report -m
=============== test session starts ===============
platform darwin -- Python 3.7.0, pytest-3.10.0, py-1.7.0, pluggy-0.8.0
rootdir: /Volumes/Case_Sensitive/trio-websocket, inifile: pytest.ini
plugins: trio-0.5.1, cov-2.6.0
collected 31 items / 30 deselected

tests/test_connection.py .                                                                                   [100%]

---------- coverage: platform darwin, python 3.7.0-final-0 -----------
Name                         Stmts   Miss  Cover
------------------------------------------------
trio_websocket/__init__.py       1      0   100%
trio_websocket/_impl.py        365    156    57%
trio_websocket/version.py        1      0   100%
------------------------------------------------
TOTAL                          367    156    57%


=============== 1 passed, 30 deselected in 0.23 seconds ===============
Name                         Stmts   Miss  Cover   Missing
----------------------------------------------------------
trio_websocket/__init__.py       1      0   100%
trio_websocket/_impl.py        365    156    57%   [SNIP] 808-809, 846, 852-854 [SNIP]
trio_websocket/version.py        1      0   100%
----------------------------------------------------------
TOTAL                          367    156    57%

I checked to see if there's any other difference in coverage between these two runs:

trio_websocket/_impl.py        365    160    56%   605-606, 842-846
trio_websocket/_impl.py        365    156    57%   604, 846

Notice that when the _write_pending() exception lines are skipped, so are 605 and 606. Here are the relevant source lines.

595: async def send_message(self, message):
596-602: <docstring>
603:     if self._close_reason:
604:         raise ConnectionClosed(self._close_reason)
605:     self._wsproto.send_data(message)
606:     await self._write_pending()

So the race seems to be related to sending at the same time a connection is closed. Sometimes it raise in line 604, and sometimes it raises inside _write_pending(). I don't necessarily think either code path is wrong, but each test needs to have a deterministic shutdown.

Nov 12 '18 19:11 mehaase

I messed around with this a bit on its own, and also in conjunction with implementing timeouts. I have a few ideas about how to improve this situation:

Many of the WebSocketConnection methods have a guard condition like if self._close_reason: raise ConnectionClosed. This guard condition is unnecessary and just adds a second code path that is prone to race conditions, as seen in the example at the end of my previous post. Most of these guards can safely be removed.
A lot of the tests have race conditions themselves, e.g. a test creates a client and a server and relies on the automatically closing behavior, so both ends race to initiating the closing handshake. Each test needs to have a single, deterministic order for tearing down connections.
- As an example, we can force the client to initiate closing by having the server call get_message() when no message is pending, which will raise ConnectionClosed when the client has sent the closing handshake.
- Alternatively, tests can use Trio events to force an intended ordering of events.
I don't think it's possible to make network tests 100% determinstic, so a last resort we can add tests that force cover of problematic areas like lines 842-844 shown above. I.e. some tests may still have slightly different execution paths based on races, but at least one test is guaranteed to deterministically cover those lines. This doesn't prevent intermittent test failures, but it does ensure consistent coverage.

Nov 14 '18 21:11 mehaase

trio.testing may help

https://trio.readthedocs.io/en/latest/reference-testing.html#inter-task-ordering

Essentially the same as using events.

Nov 14 '18 23:11 belm0

I don't think it's possible to make network tests 100% determinstic, so a last resort we can add tests that force cover of problematic areas like lines 842-844 shown above.

This is what I usually do...

Nov 14 '18 23:11 njsmith

trio-websocket trio-websocket copied to clipboard

tests have flakey coverage?

trio-websocket
trio-websocket copied to clipboard