aiohomekit
aiohomekit copied to clipboard
Sleepy Thread devices
https://github.com/chrysn/aiocoap/issues/175#issuecomment-1310752470
I think a forum user is hitting this aiocoap
issue. Average ping RTT to the sleepy accessory is ~2600ms, over the 2.0 aiocoap
ACK_TIMEOUT
.
2022-11-09 19:08:17.637 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:d350:15d9:7a39:fc10]:5683/2
2022-11-09 19:08:17.638 DEBUG (MainThread) [aiohomekit.protocol.tlv] sending [
6 (State): (1 bytes/<class 'bytearray'>) 0x01
3 (PublicKey): (32 bytes/<class 'bytes'>) b'\xed\xc4\n\xeb\xf2E\xbe\r\x19P\xf3\x912\xf4\x1c\x0cq\xc8U\x9c\xdfP\xb1\x80\\\xc6\xcb\xc6\xcd~\x01+'
]
2022-11-09 19:08:20.009 INFO (MainThread) [coap-server] Retransmission, Message ID: 37481.
2022-11-09 19:08:20.745 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable
Note that the retransmit triggers something (the BR?) to entirely kill the connection, probably since the first is still outstanding.
I'll put together a PR for aiocoap
with a knob we'll have to adjust the CoAP controller here.
Well, changing ACK_TIMEOUT
from 2.0 to 12.0 didn't seem to help this particular user. I need to order a few sleepy Thread devices so I can get this sorted out.
Wemo Stage and Eve Thermo are the sleepy HAP devices I'm testing with FWIW.
As I said in ^, bumping ACK_TIMEOUT to 10 has got rid of thousands of "Received Type.ACK from <UDP6EndpointAddress> but could not match it to a running exchange" warnings in my logs.
Looking at https://github.com/chrysn/aiocoap/blob/419c88cb7bec693f3c33b188a8e394e6666983c5/aiocoap/messagemanager.py#L215, they were caused by original transmission and the retransmission sharing an entry in self._active_exchanges
. If both are ACK'd, original will clear the entire in active exchanges first, then the retransmission won't be able to find itself in active exchanges any more, causing that log spam.
So I think even if its not the root cause for your user, its something we need to resolve as that error message is very common and comes up a lot in HA tickets.
I need to read the Thread spec to see how much buffering is allowed for sleepy devices. Bumping the timeout for connections to sleepy Thread devices would probably be an OK fix. Handling re-transmit ACKs per the COAP protocol spec would also be a good fix.
I think we want both - it seems daft (and wasteful) to retransmit when we know sleepy thread devices will "timeout" half the time.
But equally if it's valid to ACK a retransmission and the original then aiocoap should be silently handling that.
(But yes, I plucked 10s out of the air, we should use something closer to the spec - I thought it was 5s but haven't checked)
This has been said in other tickets, but duping here.
Latest aiocoap should no longer RST in violation of spec and makes re-transmit dupe ACK's less noisy in the logs. They do however still generate noise, so we need to handle the timeouts and make them longer than the sleep interval of the device.
Sleepy devices I have do sleep for 5s, however it's set by a characteristic called "Sleep Interval". So we need to tune the retry interval per endpoint, respecting that char.
https://github.com/chrysn/aiocoap/pull/294 will let us avoid re-transmits during a sleepy devices sleep interval. We'll probably want a TransportTuning
per connection, and we should use the "Sleep Interval" characteristic value if its available to configure it.