aiohomekit icon indicating copy to clipboard operation
aiohomekit copied to clipboard

Add Thread provisioning to BLE

Open roysjosh opened this issue 3 years ago • 16 comments
trafficstars

roysjosh avatar Aug 13 '22 02:08 roysjosh

First pass. AID probably shouldn't be hard coded. I don't like forcing the Connection to CoAP and would rather rely on ZC finding the same HKID via mDNS after provisioning completes.

roysjosh avatar Aug 13 '22 02:08 roysjosh

I think it's fine to hardcode AID - HAP BLE spec mandates there is only one accessory and that it's always aid 1. BUT I think @bdraco added a constant for it recently so we should probably use that.

Jc2k avatar Aug 13 '22 06:08 Jc2k

In your testing does BLE still work after switching on CoAP?

Jc2k avatar Aug 13 '22 07:08 Jc2k

RE: Using zc to find the CoAP version of the pairing:

I think the next thing I want to sort is to implement the ZC ideas i've been waffling on about since the start of the year:

The 2 zeroconf based hap controllers should add listeners to zc. This should use this API (as suggested by @bdraco):

zc.async_add_listener(self, None)
....

    def async_update_records(self, zc: 'Zeroconf', now: float, records: List[RecordUpdate]) -> None:
        """Updates service information from a DNS record.

        This method will be run in the event loop.
        """
        pass
...
zc.async_remove_listener(self)  

Either this data needs parsing into the same format that async_find returns, or maybe we can use the callbacks as triggers to re-read from the cache using the existing code.

Then I want to notify any active pairings directly from aiohomekit:

  • If c# changes
  • If ip/port changes

And ultimately stop HA needing to tell us that. And we'd actually call back to HA when the characteristics cache has changed, so it can re-enumerate and create new entities.

And then why i bring this up here is.... right now async_find only returns stuff in the cache. in this case i either want async_find or a new variant of it to be able to wait until there is new hit from zc.

Jc2k avatar Aug 13 '22 07:08 Jc2k

Right now we hard-code the backend a pairing is supposed to use, but maybe now we can lift that restriction. And maybe async_find can take a list of allowed transports for your use case?

Jc2k avatar Aug 13 '22 08:08 Jc2k

In your testing does BLE still work after switching on CoAP?

For Nanoleaf Essentials, no, they disappear from BLE while attempting to connect and while connected to Thread.

roysjosh avatar Aug 13 '22 13:08 roysjosh

https://github.com/Jc2k/aiohomekit/pull/151 is a step towards allowing the meta-controller to find this first backend with a given hkid.

Jc2k avatar Aug 14 '22 09:08 Jc2k

I've got some thread dongles arriving tomorrow so I should be able to spin up a border router with one and then try and get some devices to connect to it. Failng that I should at least be able to sniff for an unused network id and channel to setup the nanoleaf border router.

Jc2k avatar Aug 24 '22 15:08 Jc2k

Well, this is interesting. I did get this to work through HA once but in general it fails. aiohomekitctl always succeeds AFAICT, using the original PR with everything in aiohomekit and nothing in HA. So I'm missing something.

btmon -t log of HA's failure:

< ACL Data TX: Handle 64 flags 0x00 dlen 27           #1 [hci0] 12:19:33.383333
< ACL Data TX: Handle 64 flags 0x01 dlen 11           #2 [hci0] 12:19:33.383347
      ATT: Write Request (0x12) len 33
        Handle: 0x0080
          Data: 966f1159bde22ca0358eff5b1a6fbb0c6082bc865cb051ba4c934b099178b6
> HCI Event: Number of Completed Pac.. (0x13) plen 5  #3 [hci0] 12:19:33.521823
        Num handles: 1
        Handle: 64
        Count: 2
> ACL Data RX: Handle 64 flags 0x02 dlen 5            #4 [hci0] 12:19:33.569315
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 64 flags 0x00 dlen 7            #5 [hci0] 12:19:33.571014
      ATT: Read Request (0x0a) len 2
        Handle: 0x0080
> ACL Data RX: Handle 64 flags 0x02 dlen 27           #6 [hci0] 12:19:33.764500
> ACL Data RX: Handle 64 flags 0x01 dlen 1            #7 [hci0] 12:19:33.764985
      ATT: Read Response (0x0b) len 23
        Value: 320ae24e423ec9155430fbd83dc112e8ebca8c01151555
< ACL Data TX: Handle 64 flags 0x00 dlen 27           #8 [hci0] 12:19:33.769754
< ACL Data TX: Handle 64 flags 0x01 dlen 27           #9 [hci0] 12:19:33.769785
< ACL Data TX: Handle 64 flags 0x01 dlen 27          #10 [hci0] 12:19:33.769792
< ACL Data TX: Handle 64 flags 0x01 dlen 15          #11 [hci0] 12:19:33.769798
      ATT: Write Request (0x12) len 91
        Handle: 0x0080
          Data: 8a97f0712725b9c597fc9317de3200e12b44087ee59eeb1788139626144dec598d8fa4fbfe6b449ded868502f1ba718f4e9263362458b90887febf47f0a5c43c4d4ec28fc95e4766d00943cf2335cd7e1c7bd12cbf9d0612c9
> HCI Event: Number of Completed Pa.. (0x13) plen 5  #12 [hci0] 12:19:33.797800
        Num handles: 1
        Handle: 64
        Count: 1
> HCI Event: Number of Completed Pa.. (0x13) plen 5  #13 [hci0] 12:19:33.813804
        Num handles: 1
        Handle: 64
        Count: 2
> HCI Event: Number of Completed Pa.. (0x13) plen 5  #14 [hci0] 12:19:33.815805
        Num handles: 1
        Handle: 64
        Count: 2
> ACL Data RX: Handle 64 flags 0x02 dlen 5           #15 [hci0] 12:19:33.861806
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 64 flags 0x00 dlen 7           #16 [hci0] 12:19:33.862978
      ATT: Read Request (0x0a) len 2
        Handle: 0x0080
> HCI Event: Number of Completed Pa.. (0x13) plen 5  #17 [hci0] 12:19:34.047809
        Num handles: 1
        Handle: 64
        Count: 1
> HCI Event: Disconnect Complete (0x05) plen 4       #18 [hci0] 12:19:34.349817

btmon -t log of aiohomekitctl succeeding:

< ACL Data TX: Handle 64 flags 0x00 dlen 27         #857 [hci0] 13:28:10.517949
< ACL Data TX: Handle 64 flags 0x01 dlen 11         #858 [hci0] 13:28:10.517975
      ATT: Write Request (0x12) len 33
        Handle: 0x0080 Type: Vendor specific (915276bb-2600-0080-0010-000004070000)
          Data: a88757c96d85db17551e8cd982a6ecd6545655e2185e6d3c373e8fcf258516
> HCI Event: Number of Completed P.. (0x13) plen 5  #859 [hci0] 13:28:10.562224
        Num handles: 1
        Handle: 64 Address: F8:F5:5A:XX:XX:XX (Static)
        Count: 2
> ACL Data RX: Handle 64 flags 0x02 dlen 5          #860 [hci0] 13:28:10.609671
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 64 flags 0x00 dlen 7          #861 [hci0] 13:28:10.610742
      ATT: Read Request (0x0a) len 2
        Handle: 0x0080 Type: Vendor specific (915276bb-2600-0080-0010-000004070000)
> ACL Data RX: Handle 64 flags 0x02 dlen 27         #862 [hci0] 13:28:10.804763
> ACL Data RX: Handle 64 flags 0x01 dlen 1          #863 [hci0] 13:28:10.805368
      ATT: Read Response (0x0b) len 23
        Value: e360227a6674e05f5e21d3c7cb9ab029332c50deb23314
< ACL Data TX: Handle 64 flags 0x00 dlen 27         #864 [hci0] 13:28:10.806882
< ACL Data TX: Handle 64 flags 0x01 dlen 27         #865 [hci0] 13:28:10.806896
< ACL Data TX: Handle 64 flags 0x01 dlen 27         #866 [hci0] 13:28:10.806898
< ACL Data TX: Handle 64 flags 0x01 dlen 15         #867 [hci0] 13:28:10.806902
      ATT: Write Request (0x12) len 91
        Handle: 0x0080 Type: Vendor specific (915276bb-2600-0080-0010-000004070000)
          Data: 08930e45b1aa8c6e67dcc3da922f55a823ccb412f30f2c1ae289aa9a3f9f6e8a265769eb64459339ed102aa1ca4f337d3eb51339c0ddecb9024f4affad75ff160c8ed08bc3912b10674cd8ae6f90ae9fe105925b4751a30bd0
> HCI Event: Number of Completed P.. (0x13) plen 5  #868 [hci0] 13:28:10.854250
        Num handles: 1
        Handle: 64 Address: F8:F5:5A:XX:XX:XX (Static)
        Count: 2
> HCI Event: Number of Completed P.. (0x13) plen 5  #869 [hci0] 13:28:10.855210
        Num handles: 1
        Handle: 64 Address: F8:F5:5A:XX:XX:XX (Static)
        Count: 2
> ACL Data RX: Handle 64 flags 0x02 dlen 5          #870 [hci0] 13:28:10.902260
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 64 flags 0x00 dlen 7          #871 [hci0] 13:28:10.903395
      ATT: Read Request (0x0a) len 2
        Handle: 0x0080 Type: Vendor specific (915276bb-2600-0080-0010-000004070000)
> HCI Event: Number of Completed P.. (0x13) plen 5  #872 [hci0] 13:28:10.951248
        Num handles: 1
        Handle: 64 Address: F8:F5:5A:XX:XX:XX (Static)
        Count: 2
> HCI Event: Disconnect Complete (0x05) plen 4      #873 [hci0] 13:28:11.390254

roysjosh avatar Aug 25 '22 18:08 roysjosh

What does a failure look like?

I'm wondering if its just that there are live entities connected to it, so there will be reconnection attempts, advertisements getting handled, etc in parallel to the unpair attempt. We were actually just talking about a case earlier where we unpair a device but we see an s# bump so try to poll after unpair.

Jc2k avatar Aug 25 '22 18:08 Jc2k

When HA provisions, the bulb flashes after a second and then HA reconnects to it a few seconds later. When aiohomekitctl provisions, the bulb does NOT flash and a few seconds later it is advertised up over zeroconf. Both get a disconnect notification over bluez/dbus similar to: Operation failed with ATT error: 0x0e (Connection Rejected Due To Security Reasons)

roysjosh avatar Aug 25 '22 19:08 roysjosh

Operation failed with ATT error: 0x0e (Connection Rejected Due To Security Reasons) usually means you tried to do something when it was already disconnected. Its a pretty misleading error

bdraco avatar Aug 25 '22 19:08 bdraco

Yeah, I'm not worried about that as it shows up for success and failure. I think the bulb disconnects from BLE to scan for the provided Thread network. I don't actually have much to go on hence the btmon -t logs. There are a few differences there but I don't know if they are important nor how to influence the BLE comms at that level in any case.

roysjosh avatar Aug 25 '22 19:08 roysjosh

Can we unload the config entry, move transports, then load the config entry. Unloading it should stop us doing anything in the background in parallel.

(We sort of do this when deleting a pairing - it is unloaded by HA then we have to manually load a pairing to remove the pairing on the device).

Jc2k avatar Aug 25 '22 19:08 Jc2k

Also wonder if HA doing active scanning might be a factor.

Jc2k avatar Aug 25 '22 19:08 Jc2k

Unsuccessful shots in the dark:

  • HA: unsubscribed from everything before connection.pairing.thread_provision(...)
  • HA: called connection.async_unload() before connection.pairing.thread_provision(...)
  • aiohomekit: took the self._connection_lock before the second _async_request with the Thread TLVs and then did await asyncio.sleep(5.0) in the except block

roysjosh avatar Aug 26 '22 01:08 roysjosh

https://github.com/home-assistant/core/pull/78629 might help. With that (and https://github.com/Jc2k/aiohomekit/pull/178) we now won't try to reconnect immediately after closing a connection. I think without that, the BLE connection is firing up in the background and re-polling the device state.

I think config_entry.async_unload(hass) might be the best way to "unload" things, rather than connection.async_unload.

Jc2k avatar Sep 17 '22 18:09 Jc2k

Well, some progress. A rebase to the latest aiohomekit + HA core led to a successful provisioning of a Nanoleaf Essentials A19 to a HomePod mini's Thread network. I'll have to run this a few times to see how repeatable it is & then try with the Nanoleaf Elements BR.

roysjosh avatar Nov 01 '22 03:11 roysjosh

core changes since I don't think I linked them earlier: https://github.com/roysjosh/core/tree/hap-thread-provision

roysjosh avatar Nov 01 '22 03:11 roysjosh

Okay, more successes... I've added some code to my core fork to force the ZC discovery to remap an entry from BLE to CoAP/IP. It's gross. And it requires a reload of the integration after the ZC discovery completes to get comms to the device. But it works. Oh, I also had to change the unique ID for BLE from the accessory address to to HKID. I'm sure that will break things.

roysjosh avatar Nov 01 '22 22:11 roysjosh

Oh, I also had to change the unique ID for BLE from the accessory address to to HKID

We should probably do that in general and migrate all of them first

bdraco avatar Nov 01 '22 22:11 bdraco

https://github.com/home-assistant/core/pull/87550 and https://github.com/Jc2k/aiohomekit/pull/282 should fix the BLE config entries to be migratable with less gross-ness.

Jc2k avatar Feb 06 '23 15:02 Jc2k

Both of those are now merged.

Hoping to get my yellow set up next. Then:

Rebase https://github.com/roysjosh/core/commit/34158d6c5db548f3cf24844448e1d3e602d943a9 and then use the new thread datastore (should take a key into the store, and fetch the network details from there). This might be tricky without the UI or app changes to go with it, so I'll probably do it last.

The service call should manipulate the config entry itself - something like:

hass.config_entries.async_update_entry(
    config_entry, data={**config_entry.data, "Connection": "CoAP"}
)

Which should get rid of hack in the config flow.

Then it should force the integration to reload.

On the aiohomekit side, I want to rebase and add @bdraco's changes. In particular, I want to make sure that when the provision has happened the connection is killed, and there isn't a race where we could start a BLE session while the device is migrating to thread.

Jc2k avatar Feb 08 '23 07:02 Jc2k

Codecov Report

Base: 67.05% // Head: 67.03% // Decreases project coverage by -0.02% :warning:

Coverage data is based on head (6baf3e5) compared to base (dd34ac6). Patch coverage: 74.52% of modified lines in pull request are covered.

:exclamation: Current head 6baf3e5 differs from pull request most recent head b83d03d. Consider uploading reports for the commit b83d03d to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #148      +/-   ##
==========================================
- Coverage   67.05%   67.03%   -0.02%     
==========================================
  Files          71       72       +1     
  Lines        6690     6787      +97     
==========================================
+ Hits         4486     4550      +64     
- Misses       2204     2237      +33     
Flag Coverage Δ
unittests 67.03% <74.52%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohomekit/zeroconf.py 70.96% <0.00%> (-0.39%) :arrow_down:
aiohomekit/controller/ble/pairing.py 22.39% <31.03%> (+0.33%) :arrow_up:
aiohomekit/testing.py 83.01% <50.00%> (-0.32%) :arrow_down:
aiohomekit/controller/controller.py 66.36% <66.66%> (ø)
aiohomekit/tlv8.py 92.65% <83.33%> (-0.33%) :arrow_down:
aiohomekit/controller/abstract.py 81.46% <90.00%> (+0.38%) :arrow_up:
aiohomekit/controller/__init__.py 100.00% <100.00%> (ø)
aiohomekit/controller/ble/controller.py 31.85% <100.00%> (-8.33%) :arrow_down:
aiohomekit/controller/coap/controller.py 58.33% <100.00%> (+3.78%) :arrow_up:
aiohomekit/controller/coap/pairing.py 32.83% <100.00%> (+0.50%) :arrow_up:
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Feb 09 '23 14:02 codecov[bot]

I've changed the controller interface a little so we can do coap specific async_find inside the service call:

    async def thread_provision(service: ServiceCall) -> None:
        hkid: str = service.data[ATTR_HKID]
        network_name: str = service.data[ATTR_THREAD_NETWORK_NAME]
        channel: int = service.data[ATTR_THREAD_CHANNEL]
        pan_id: str = service.data[ATTR_THREAD_PAN_ID]
        extended_pan_id: str = service.data[ATTR_THREAD_EXTENDED_PAN_ID]
        network_key: str = service.data[ATTR_THREAD_NETWORK_KEY]
        unknown: int = service.data[ATTR_THREAD_UNKNOWN_FLAG]
        _LOGGER.warning("Provisioning Thread credentials: %s=%s, %s=%s, %s=%s, %s=%s, %s=%s, %s=%s, %s=%s" % (
            ATTR_HKID, hkid,
            ATTR_THREAD_NETWORK_NAME, network_name,
            ATTR_THREAD_CHANNEL, channel,
            ATTR_THREAD_PAN_ID, pan_id,
            ATTR_THREAD_EXTENDED_PAN_ID, extended_pan_id,
            ATTR_THREAD_NETWORK_KEY, 'REDACTED',
            ATTR_THREAD_UNKNOWN_FLAG, unknown,
        ))

        if hkid not in hass.data[KNOWN_DEVICES]:
            _LOGGER.warning("Unknown HKID")
            return

        connection: HKDevice = hass.data[KNOWN_DEVICES][hkid]
        await connection.pairing.thread_provision(network_name, channel, pan_id, extended_pan_id, network_key, unknown)

        from aiohomekit.controller import TransportType
        from homeassistant.exceptions import HomeAssistantError

        try:
            discovery = await hass.data[CONTROLLER].transports[TransportType.COAP].async_find(connection.unique_id, timeout=30)
            hass.config_entries.async_update_entry(
                connection.config_entry,
                data={
                    **connection.config_entry.data,
                    "Connection": "CoAP",
                    "AccessoryIP": discovery.description.address,
                    "AccessoryPort": discovery.description.port,
                }
            )
            _LOGGER.debug("%s: Found device on local network, migrating integration to Thread", hkid)

        except AccessoryNotFoundError:
            _LOGGER.debug("%s: Failed to appear on local network as a Thread device, reverting to BLE", hkid)
            raise HomeAssistantError("Could not migrate device to Thread")

        finally:
            await hass.config_entries.async_reload(connection.config_entry.entry_id)

    async_register_admin_service(
        hass,
        DOMAIN,
        SERVICE_THREAD_PROVISION,
        thread_provision,
        schema=vol.Schema(
            {
                vol.Required(ATTR_HKID): cv.string,
                vol.Required(ATTR_THREAD_NETWORK_NAME): vol.All(cv.string, vol.Length(max=16)),
                vol.Required(ATTR_THREAD_CHANNEL): vol.All(cv.positive_int, vol.Range(min=11, max=26)),
                vol.Required(ATTR_THREAD_PAN_ID): vol.All(cv.string, vol.Length(min=1, max=4)),
                vol.Required(ATTR_THREAD_EXTENDED_PAN_ID): vol.All(cv.string, vol.Length(min=1, max=16)),
                vol.Required(ATTR_THREAD_NETWORK_KEY): vol.All(cv.string, vol.Length(min=1, max=32)),
                vol.Required(ATTR_THREAD_UNKNOWN_FLAG): vol.All(cv.positive_int, vol.Range(min=0, max=255)),
            }
        )
    )

    return True

Because we shut down down the BLE pairing while the migration is in flight, i boune the config entry regardless so BLE should be functional still after a failed migration (assuming the device does return to BLE).

Next step is to port it to use the thread integration.

Jc2k avatar Feb 09 '23 15:02 Jc2k

Latest iteration adds a button to the config panel of supported devices that writes the TLV of the current "preferred" thread network. The user doesn't have to know the hkid or any of the network details (if their app has correctly synced to HA).

(Not tested - there is no UI for the new thread datastore yet, so I'm going to have to write a script to populate it tomorrow).

Jc2k avatar Feb 09 '23 22:02 Jc2k

Screenshot 2023-02-10 at 14 56 59

"Provision" button tested and working.

The TLV format used by HA (which is what meshcop uses) is different. Not just different keys, but big/little endian fun too.

Still some glitches from Nanoleaf, but if transition fails it should remain in BLE mode and functional.

Jc2k avatar Feb 10 '23 15:02 Jc2k

Latest version closes existing connection before starting, so its always running in a clean session. With this in place i've got a 10/10 success rate with Nanoleaf. Before it pretty much always failed immediately after pairing, then worked on the 2nd attempt.

Jc2k avatar Feb 14 '23 15:02 Jc2k

Did a couple of Eve Energy test provisions - looks good.

I'm pretty happy with where this is, and i'm ready to merge it I think.

Jc2k avatar Feb 14 '23 15:02 Jc2k