aiohomekit icon indicating copy to clipboard operation
aiohomekit copied to clipboard

fix: bind to a specific local IPv6 address during pair verify

Open roysjosh opened this issue 2 years ago • 13 comments

This will prevent future traffic from picking the "best" address to communicate with an accessory. Current products tie sessions to the original IPv6 address and return a CoAP error code 4.04 when a new address attempts to continue the session.

It is likely that this could explode in fun, new ways: the DHCPv6 server returning a new address, the host selecting a new random SLAAC address, etc.

roysjosh avatar Dec 20 '22 20:12 roysjosh

Cross reference aiocoap#298

roysjosh avatar Dec 20 '22 20:12 roysjosh

I'm running this locally now. I've been seeing dozens of 4.04 errors a day. Let's see if this cuts that out.

roysjosh avatar Dec 20 '22 20:12 roysjosh

Codecov Report

Base: 66.74% // Head: 66.59% // Decreases project coverage by -0.14% :warning:

Coverage data is based on head (6901939) compared to base (4b02e0d). Patch coverage: 17.39% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #272      +/-   ##
==========================================
- Coverage   66.74%   66.59%   -0.15%     
==========================================
  Files          71       71              
  Lines        6697     6718      +21     
==========================================
+ Hits         4470     4474       +4     
- Misses       2227     2244      +17     
Flag Coverage Δ
unittests 66.59% <17.39%> (-0.15%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohomekit/controller/coap/pairing.py 31.85% <0.00%> (-0.48%) :arrow_down:
aiohomekit/controller/coap/connection.py 29.50% <19.04%> (-0.43%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Dec 20 '22 20:12 codecov[bot]

Wrap "get local IPv6 address" function in try/except. This prevents network errors from marking the device as not being provided by the integration.

This does seem to have eliminated frequent 404s from my environment.

roysjosh avatar Dec 21 '22 19:12 roysjosh

This seems like a good fix, provided the library can react to local address changes. From the code, _get_local_ipv6_addr lets the kernel pick a source address and interface to connect to the accessory and the library snapshots that and assumes it will be good "forever". I wonder if there's a performant way to monitor that route and reset if it becomes invalid (e.g. SLACC reconfiguration, etc). Though in general with ULA addresses that should not happen.

I wonder if the logic could be made to prioritize ULAs or link-local versus say an ISP publicly routable address that is more subject to change (especially when using privacy-preserving randomized addresses).

jfroy avatar Dec 26 '22 18:12 jfroy

The ip module has various attributes that could help with that: https://docs.python.org/3/library/ipaddress.html#ipaddress.IPv6Address.is_private

Jc2k avatar Dec 26 '22 19:12 Jc2k

In terms of detecting failed connections, hoping we can get around to testing if the coap transport publishes a sane s# value in mdns. That number should go up if there is a state change that no one observed. So if we see it go up, we know we should restart the session.

Jc2k avatar Dec 26 '22 19:12 Jc2k

It's nice that we get a new source address every time we pair verify.

My instinct is that it'd be nice if we could have the events service bind to * but set the source address on outgoing traffic. But I don't think that helps much in practice - if the local ip changes the device won't know it and if the remote ip changes we handle it through mdns. So let's not bother.

In terms of detecting our own ip changing, ideally I'd want to defer that to HA. Presumably they have to handle that to update their mdns record.

Jc2k avatar Dec 26 '22 19:12 Jc2k

At the moment I'm tempted to merge the 4.04 handling and recovering only, as I'd like to try and understand the situations where the source address is wrong before deploying this work around. And I just don't think I'm going to get in front of a computer before the Jan release is out.

I'm struggling to think of scenarios where the source address flips if we have picked the thread ULA of the device as the destination address, the best one i came up with is if your HA loses its thread route but your home router doesn't. Default route would get it from HA to your main router, then your router delivers it to the thread router. But home routers don't generally accept thread routes out of the box, and would rely on your thread devices doing DHCP6, not just your thread router. That would be surprising as they are different L2 segments aren't they?

If we do go ahead with this work around I want to test what does happen if the ipv6 did address of HA did change - hopeful any pending recvs would get cancelled, and bubble up asyncio into our code?

Jc2k avatar Dec 28 '22 11:12 Jc2k

2.4.2 handles Code.NOT_FOUND but doesn't contain the bind fix, going to put that into HA now. It also lowers the severity of a NOT_FOUND - something that can happen for valid reasons (like a session expiring while your border router is offline, battery changes, mains power cycling, etc) that we can automatically recover from shouldn't show up in the error log.

Jc2k avatar Dec 28 '22 14:12 Jc2k

And PR to land it in HA - https://github.com/home-assistant/core/pull/84700.

Hopefully I can find time with all my devices (test trashing the network etc ti test recovery) to test the full PR but can't say it's likely before mid jan.

Jc2k avatar Dec 28 '22 14:12 Jc2k

I upgraded HA which wiped out this full by-hand change. I don't "see" the reconnects any more from the perspective of the frontend but they are still happening w/o the full fix.

roysjosh avatar Feb 07 '23 14:02 roysjosh

Yep, still on my radar to loop back round to here and test out the full fix when I've got more time to try and break things.

Jc2k avatar Feb 07 '23 14:02 Jc2k