Handle PONG messages before closing connection due to heartbeat timeout
What do these changes do?
Currently, if receive is never called (or not in time), PONG messages aren't handled and aiohttp closes the connection for missing PONG after half of the heartbeat time has passed. This commit changes the code that closes the connection for missing PONG so that instead of immediately closing the connection, it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.
Are there changes in behavior for the user?
The heartbeat mechanism is supposed to help identify if a websocket connection is still alive. When a PING message isn't responded to by a PONG message, the connection is closed. Websocket clients such as Web Browsers implement the heartbeat mechanism as well and for that reason, enabling heartbeat when using aiohttp can also be used to keep a websocket connection alive.
The current implementation of the heartbeat mechanism in aiohttp relies on the user to regularly call the receive() function in order to handle incoming PONG messages. If the user fails to call receive() within the given time frame, the connection gets closed by aiohttp. So when heartbeat in aiohttp is enabled in order to keep a websocket alive, it can actually cause the opposite, i.e. closing the websocket. Specifically, this happens for unidirectional communication via websockets (in our case a video stream).
The user will not notice any different behavior for "regular" use of websockets which used to work without issues. The user will only notice a change in behavior if originally their websocket connection was closed by aiohttp prematurely due to missing PONG even though as PONG has been sent by the other side.
If the user is calling receive() too late, e.g., a long running task prevents receive() from being called in time, the original implementation closed the websocket. The new implementation from this PR won't close the websocket immediately but will log a warning that a non-PONG message was received.
Is it a substantial burden for the maintainers to support this?
It's not a very big change so the burden should be fairly small.
Related issue number
There's no specific issue for this PR but the following issues are somewhat related: #7508 #7320
Checklist
- [ ] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [x] If you provide code modification, please add yourself to
CONTRIBUTORS.txt- The format is <Name> <Surname>.
- Please keep alphabetical order, the file is sorted by names.
- [x] Add a new news fragment into the
CHANGES/folder-
name it
<issue_or_pr_num>.<type>.rst(e.g.588.bugfix.rst) -
if you don't have an issue number, change it to the pull request number after creating the PR
.bugfix: A bug fix for something the maintainers deemed an improper undesired behavior that got corrected to match pre-agreed expectations..feature: A new behavior, public APIs. That sort of stuff..deprecation: A declaration of future API removals and breaking changes in behavior..breaking: When something public is removed in a breaking way. Could be deprecated in an earlier release..doc: Notable updates to the documentation structure or build process..packaging: Notes for downstreams about unobvious side effects and tooling. Changes in the test invocation considerations and runtime assumptions..contrib: Stuff that affects the contributor experience. e.g. Running tests, building the docs, setting up the development environment..misc: Changes that are hard to assign to any of the above categories.
-
Make sure to use full sentences with correct case and punctuation, for example:
Fixed issue with non-ascii contents in doctest text files -- by :user:`contributor-gh-handle`.Use the past tense or the present tense a non-imperative mood, referring to what's changed compared to the last released version of this project.
-
CodSpeed Performance Report
Merging #10544 will not alter performance
Comparing mstegmaier:master (29a439e) with master (45b861f)
Summary
✅ 46 untouched benchmarks
Codecov Report
Attention: Patch coverage is 54.54545% with 15 lines in your changes missing coverage. Please review.
Project coverage is 98.65%. Comparing base (
6d205da) to head (8f3de6f). Report is 3 commits behind head on master.
:white_check_mark: All tests successful. No failed tests found.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| aiohttp/web_ws.py | 54.54% | 12 Missing and 3 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #10544 +/- ##
==========================================
- Coverage 98.70% 98.65% -0.05%
==========================================
Files 122 122
Lines 37230 37262 +32
Branches 2064 2068 +4
==========================================
+ Hits 36748 36762 +14
- Misses 335 350 +15
- Partials 147 150 +3
| Flag | Coverage Δ | |
|---|---|---|
| CI-GHA | 98.54% <54.54%> (-0.05%) |
:arrow_down: |
| OS-Linux | 98.20% <54.54%> (-0.05%) |
:arrow_down: |
| OS-Windows | 96.14% <54.54%> (-0.05%) |
:arrow_down: |
| OS-macOS | 97.33% <54.54%> (-0.03%) |
:arrow_down: |
| Py-3.10.11 | 97.22% <54.54%> (-0.05%) |
:arrow_down: |
| Py-3.10.16 | 97.77% <54.54%> (-0.04%) |
:arrow_down: |
| Py-3.11.11 | 97.85% <54.54%> (-0.05%) |
:arrow_down: |
| Py-3.11.9 | 97.31% <54.54%> (-0.04%) |
:arrow_down: |
| Py-3.12.8 | ? |
|
| Py-3.12.9 | 98.31% <51.51%> (-0.05%) |
:arrow_down: |
| Py-3.13.1 | ? |
|
| Py-3.13.2 | 98.29% <51.51%> (-0.05%) |
:arrow_down: |
| Py-3.9.13 | 97.11% <54.54%> (-0.05%) |
:arrow_down: |
| Py-3.9.21 | 97.64% <54.54%> (-0.04%) |
:arrow_down: |
| Py-pypy7.3.16 | 86.63% <51.51%> (-4.98%) |
:arrow_down: |
| VM-macos | 97.33% <54.54%> (-0.03%) |
:arrow_down: |
| VM-ubuntu | 98.20% <54.54%> (-0.05%) |
:arrow_down: |
| VM-windows | 96.14% <54.54%> (-0.05%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I suspect the complexity of this change and the likelihood of it breaking something, combined with it likely only making a very slight (if any) reduction in closes will probably mean this won't get merged.
As mentioned in the linked discussion, a doc PR better explaining these limitations and a code sample to avoid them if memory issues are not a concern would be appreciated instead.
it checks whether a PONG has been sent. If a PONG has been sent, the connection doesn't get closed.
Maybe there's some confusion here, but this description is not correct. I would describe the change as:
If code has failed to call receive() in a timely manner and a PONG message is at the top of the buffer, the connection will no longer be closed.
If code is calling receive() normally, then this change likely has zero impact. Either the receive() call is happening frequently enough that it processes the PONG messages and this code is never run, or the receive() call is too slow in processing lots of messages in which case this code will likely receive another non-PONG message when running and still close the connection (the chances that the next message just happens to be a PONG when overloaded seems extremely slim unless the user is doing something odd).