aiohomekit
aiohomekit copied to clipboard
Reconnect if multiple CoAP PDUs go unanswered
Might help with "unresponsive" accessories, as mentioned in https://github.com/home-assistant/core/issues/80131 and the forums
Codecov Report
Base: 66.74% // Head: 66.65% // Decreases project coverage by -0.08% :warning:
Coverage data is based on head (
7ba5f1d) compared to base (4b02e0d). Patch coverage: 12.50% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## main #247 +/- ##
==========================================
- Coverage 66.74% 66.65% -0.09%
==========================================
Files 71 71
Lines 6697 6703 +6
==========================================
- Hits 4470 4468 -2
- Misses 2227 2235 +8
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 66.65% <12.50%> (-0.09%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| aiohomekit/controller/coap/connection.py | 29.71% <12.50%> (-0.21%) |
:arrow_down: |
| aiohomekit/controller/ble/controller.py | 33.81% <0.00%> (-2.16%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Will this look like the device is disconnected when an operation is in flight?
I've seen some accessories go unavailable and never recover. This is a guard against that. The PDU send/receive counters should be in sync except for the window of time between a request and response.
Sure I get what this fixes, I'm just trying to imagine what could happen between the send and receive - could we solve one bug and introduce some races. E.g. if I toggle a button too quickly could the 2nd write to the device happen in parallel, see the device is disconnected, and kill the session, and then we need to re-auth.
Is there a potential for the user to see devices go unavailable randomly for a second, just because there is a request in flight?
I think we already have a timeout on post_bytes - maybe that should explicitly kill the session?
And maybe we should use an operation lock like BLE does to stop concurrent access?
Ahh I see, good point. Hmm. Maybe the post_bytes exception handler would be a better location.
We could add that to be safe. Even if I found that Nanoleaf bulbs could handle >1 outstanding request, that may not be the same case for battery-powered devices.
Approach this a different way. Network errors or timeouts increment a counter which counts towards being considered disconnected. This should solve situations where HAP+Thread accessories end up unavailable and never recover.