aiohomekit icon indicating copy to clipboard operation
aiohomekit copied to clipboard

Reconnect if multiple CoAP PDUs go unanswered

Open roysjosh opened this issue 3 years ago • 7 comments

roysjosh avatar Nov 09 '22 15:11 roysjosh

Might help with "unresponsive" accessories, as mentioned in https://github.com/home-assistant/core/issues/80131 and the forums

roysjosh avatar Nov 09 '22 15:11 roysjosh

Codecov Report

Base: 66.74% // Head: 66.65% // Decreases project coverage by -0.08% :warning:

Coverage data is based on head (7ba5f1d) compared to base (4b02e0d). Patch coverage: 12.50% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #247      +/-   ##
==========================================
- Coverage   66.74%   66.65%   -0.09%     
==========================================
  Files          71       71              
  Lines        6697     6703       +6     
==========================================
- Hits         4470     4468       -2     
- Misses       2227     2235       +8     
Flag Coverage Δ
unittests 66.65% <12.50%> (-0.09%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohomekit/controller/coap/connection.py 29.71% <12.50%> (-0.21%) :arrow_down:
aiohomekit/controller/ble/controller.py 33.81% <0.00%> (-2.16%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Nov 09 '22 15:11 codecov[bot]

Will this look like the device is disconnected when an operation is in flight?

Jc2k avatar Nov 09 '22 15:11 Jc2k

I've seen some accessories go unavailable and never recover. This is a guard against that. The PDU send/receive counters should be in sync except for the window of time between a request and response.

roysjosh avatar Nov 09 '22 15:11 roysjosh

Sure I get what this fixes, I'm just trying to imagine what could happen between the send and receive - could we solve one bug and introduce some races. E.g. if I toggle a button too quickly could the 2nd write to the device happen in parallel, see the device is disconnected, and kill the session, and then we need to re-auth.

Is there a potential for the user to see devices go unavailable randomly for a second, just because there is a request in flight?

I think we already have a timeout on post_bytes - maybe that should explicitly kill the session?

And maybe we should use an operation lock like BLE does to stop concurrent access?

Jc2k avatar Nov 09 '22 15:11 Jc2k

Ahh I see, good point. Hmm. Maybe the post_bytes exception handler would be a better location.

We could add that to be safe. Even if I found that Nanoleaf bulbs could handle >1 outstanding request, that may not be the same case for battery-powered devices.

roysjosh avatar Nov 09 '22 15:11 roysjosh

Approach this a different way. Network errors or timeouts increment a counter which counts towards being considered disconnected. This should solve situations where HAP+Thread accessories end up unavailable and never recover.

roysjosh avatar Dec 21 '22 19:12 roysjosh