aiohomekit icon indicating copy to clipboard operation
aiohomekit copied to clipboard

Sleepy Thread devices

Open roysjosh opened this issue 2 years ago • 8 comments

https://github.com/chrysn/aiocoap/issues/175#issuecomment-1310752470

I think a forum user is hitting this aiocoap issue. Average ping RTT to the sleepy accessory is ~2600ms, over the 2.0 aiocoap ACK_TIMEOUT.

2022-11-09 19:08:17.637 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:d350:15d9:7a39:fc10]:5683/2
2022-11-09 19:08:17.638 DEBUG (MainThread) [aiohomekit.protocol.tlv] sending [
6 (State): (1 bytes/<class 'bytearray'>) 0x01
3 (PublicKey): (32 bytes/<class 'bytes'>) b'\xed\xc4\n\xeb\xf2E\xbe\r\x19P\xf3\x912\xf4\x1c\x0cq\xc8U\x9c\xdfP\xb1\x80\\\xc6\xcb\xc6\xcd~\x01+'
]

2022-11-09 19:08:20.009 INFO (MainThread) [coap-server] Retransmission, Message ID: 37481.

2022-11-09 19:08:20.745 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable

Note that the retransmit triggers something (the BR?) to entirely kill the connection, probably since the first is still outstanding.

I'll put together a PR for aiocoap with a knob we'll have to adjust the CoAP controller here.

roysjosh avatar Nov 10 '22 18:11 roysjosh

Well, changing ACK_TIMEOUT from 2.0 to 12.0 didn't seem to help this particular user. I need to order a few sleepy Thread devices so I can get this sorted out.

roysjosh avatar Nov 12 '22 15:11 roysjosh

Wemo Stage and Eve Thermo are the sleepy HAP devices I'm testing with FWIW.

Jc2k avatar Nov 12 '22 15:11 Jc2k

As I said in ^, bumping ACK_TIMEOUT to 10 has got rid of thousands of "Received Type.ACK from <UDP6EndpointAddress> but could not match it to a running exchange" warnings in my logs.

Looking at https://github.com/chrysn/aiocoap/blob/419c88cb7bec693f3c33b188a8e394e6666983c5/aiocoap/messagemanager.py#L215, they were caused by original transmission and the retransmission sharing an entry in self._active_exchanges. If both are ACK'd, original will clear the entire in active exchanges first, then the retransmission won't be able to find itself in active exchanges any more, causing that log spam.

So I think even if its not the root cause for your user, its something we need to resolve as that error message is very common and comes up a lot in HA tickets.

Jc2k avatar Nov 15 '22 22:11 Jc2k

I need to read the Thread spec to see how much buffering is allowed for sleepy devices. Bumping the timeout for connections to sleepy Thread devices would probably be an OK fix. Handling re-transmit ACKs per the COAP protocol spec would also be a good fix.

jfroy avatar Nov 15 '22 22:11 jfroy

I think we want both - it seems daft (and wasteful) to retransmit when we know sleepy thread devices will "timeout" half the time.

But equally if it's valid to ACK a retransmission and the original then aiocoap should be silently handling that.

Jc2k avatar Nov 15 '22 23:11 Jc2k

(But yes, I plucked 10s out of the air, we should use something closer to the spec - I thought it was 5s but haven't checked)

Jc2k avatar Nov 15 '22 23:11 Jc2k

This has been said in other tickets, but duping here.

Latest aiocoap should no longer RST in violation of spec and makes re-transmit dupe ACK's less noisy in the logs. They do however still generate noise, so we need to handle the timeouts and make them longer than the sleep interval of the device.

Sleepy devices I have do sleep for 5s, however it's set by a characteristic called "Sleep Interval". So we need to tune the retry interval per endpoint, respecting that char.

Jc2k avatar Nov 22 '22 22:11 Jc2k

https://github.com/chrysn/aiocoap/pull/294 will let us avoid re-transmits during a sleepy devices sleep interval. We'll probably want a TransportTuning per connection, and we should use the "Sleep Interval" characteristic value if its available to configure it.

Jc2k avatar Dec 01 '22 07:12 Jc2k