aiohomekit icon indicating copy to clipboard operation
aiohomekit copied to clipboard

Make zeroconf availablity drive the decision to poll or not

Open bdraco opened this issue 3 years ago • 10 comments

    Maybe the zeroconf recording dropping drives unavailability?

Originally posted by @Jc2k in https://github.com/Jc2k/aiohomekit/issues/237#issuecomment-1302579181

This would allow us to get rid of the polling and only make unavailable when the PTR expires.


I think the implementation is:

Polling only if device is not present in zeroconf

bdraco avatar Nov 06 '22 22:11 bdraco

If a crash + recovery happens quickly (before records expire) would we detect that connection might be broken?

Jc2k avatar Nov 06 '22 23:11 Jc2k

Maybe a is_advertising property on the pairing

BLE -> -- we have a description -- effectively always true if we have ever seen the device IP -> -- zeroconf record is present --- COAP -> -- zeroconf record is present ---

bdraco avatar Nov 06 '22 23:11 bdraco

If a crash + recovery happens quickly (before records expire) would we detect that connection might be broken?

If it hard crashed the zeroconf record wouldn't get withdrawn (requires an announce with a TTL of 0) and we can't tell the difference between the accessory sending an unsolicited broadcast at startup and a normal response so we wouldn't know its crashed from zeroconf alone

bdraco avatar Nov 06 '22 23:11 bdraco

If it does drop and come back we will get a TCP RST though so we know we need to poll if the connections gets reset by peer

bdraco avatar Nov 06 '22 23:11 bdraco

I'm not sure we'll always get an RST.

If connection is established and idle and there's nothing like NAT or keep alives, and then there is a hard crash, the device will just vanish without sending one.

Jc2k avatar Nov 06 '22 23:11 Jc2k

When the device reconnects to the network aiohomekit's stale connection will keep trying to talk to the device, and when the device's TCP stack sees it trying to talk over a connection that isn't established its going to send back an RST because aiohomekit's connection isn't known.... at least if it implements TCP/IP correctly 🙃

bdraco avatar Nov 06 '22 23:11 bdraco

We would also want to poll again for any connection drop, the RST case is just the obvious case of the accessory rebooting

bdraco avatar Nov 06 '22 23:11 bdraco

If connection is established and idle and there's nothing like NAT or keep alives, and then there is a hard crash, the device will just vanish without sending one.

If its a hard crash and its not coming back it would wait for the zeroconf record to expire before it would poll, fail, and mark it unavailable.

bdraco avatar Nov 06 '22 23:11 bdraco

Its always going to be a trade off though.

The other option is to change the poll interval to be higher when the zeroconf record is present so we reduce the traffic at the cost of waiting a bit longer to be unavailable.

I tend to thing traffic reduction is more important than quick unavailability reporting given how frequently we get wifi and network related issues but not sold either way.

bdraco avatar Nov 06 '22 23:11 bdraco

I don't disagree getting rid of polling would be better.

We could back off the poll interval and see if anyone notices.

Something else to think about... The docs we have access to AND the open ADK mandates that s# be 1. Yet most of my certified IP devices care to disagree:

Name: Hive Bridge (QYQ-313)
State Number (s#): 91
...
Name: Philips hue - 482544
State Number (s#): 183
...
Name: Elements F3C8
State Number (s#): 8
...
Name: eufy HomeBase2-0CEB
State Number (s#): 615
...
Name: Aqara-Hub-E1-91C7
State Number (s#): 915

The ones that seem to be at 1 are an Arlo baby camera and an Eve Extend.

I found some logs on the ADK issue tracker (issue 43) that suggest MFI devices might republish their zeroconf record when an event happens and no one is connected. The log strings don't exist in the open ADK though (I think someone might have raised a ticket in the wrong place - i.e. in a public tracker instead of a private tracker).

I also found this in the HTTP event code path of the esp HomeKit sdk:

    /* If no controller was connected and no disconnected event was sent,
     * reannaounce mDNS. That will increment state number as required
     * by HAP Spec R15.
     */
    if (!ctrl_connected && !hap_priv.disconnected_event_sent) {
        hap_mdns_announce(false);
        hap_priv.disconnected_event_sent = true;
    }

(Via https://github.com/espressif/esp-homekit-sdk/blob/eade505ade6c148056044307ce7b779bbc8e218b/components/homekit/esp_hap_core/src/esp_hap_ip_services.c#L1484-L1491)

Perhaps we should track the s# for changes? As if s# is only incremented (as suggested in above code) when no one is connected, it might mean we think we are connected but aren't?

Jc2k avatar Nov 07 '22 04:11 Jc2k