Heatmiser-for-home-assistant
Heatmiser-for-home-assistant copied to clipboard
Updating heatmiserneo climate took longer than the scheduled update interval
After upgrading Heatmiser integration to the latest version (with neohubapi
being a separate library now) it seems to occasionnaly get "stuck", with status not being updated and commands not being passed to the Neohub.
When this happened, I observed the following symptoms:
- An "Updating heatmiserneo climate took longer than the scheduled update interval" error message is logged every minute:
Feb 6 11:09:24 lizard hass[9304]: 2021-02-06 11:09:24 WARNING (MainThread) [homeassistant.components.climate] Updating heatmiserneo climate took longer than the scheduled update interval 0:01:00
Feb 6 11:10:24 lizard hass[9304]: 2021-02-06 11:10:24 WARNING (MainThread) [homeassistant.components.climate] Updating heatmiserneo climate took longer than the scheduled update interval 0:01:00
Feb 6 11:11:24 lizard hass[9304]: 2021-02-06 11:11:24 WARNING (MainThread) [homeassistant.components.climate] Updating heatmiserneo climate took longer than the scheduled update interval 0:01:00
- No network traffic is being sent from Home Assistant to Neohub (checked using
tcpdump
). -
netstats
shows a few established tcp connections from Home Assistant to the Neohub:
netstat -n | grep 10.11.12.67
tcp 0 0 10.11.12.2:44380 10.11.12.67:4242 ESTABLISHED
tcp 0 0 10.11.12.2:44378 10.11.12.67:4242 ESTABLISHED
tcp 0 0 10.11.12.2:57284 10.11.12.67:4242 TIME_WAIT
tcp 0 0 10.11.12.2:44382 10.11.12.67:4242 ESTABLISHED
tcp 0 0 10.11.12.2:44384 10.11.12.67:4242 ESTABLISHED
- Stopping homeassistant shows that these TCP connections were not actually active on the Neohub side (it sends back RST in response to FIN):
12:19:02.932468 IP 10.11.12.2.44378 > 10.11.12.67.4242: Flags [F.], seq 999241392, ack 2491317285, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.932671 IP 10.11.12.2.44380 > 10.11.12.67.4242: Flags [F.], seq 2142873974, ack 1140983017, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.932740 IP 10.11.12.2.44382 > 10.11.12.67.4242: Flags [F.], seq 3942683128, ack 60268326, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.932811 IP 10.11.12.2.44384 > 10.11.12.67.4242: Flags [F.], seq 3296834747, ack 1860058818, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.932807 IP 10.11.12.67.4242 > 10.11.12.2.44378: Flags [R], seq 2491317285, win 0, length 0
12:19:02.932866 IP 10.11.12.67.4242 > 10.11.12.2.44380: Flags [R], seq 1140983017, win 0, length 0
12:19:02.932917 IP 10.11.12.67.4242 > 10.11.12.2.44382: Flags [R], seq 60268326, win 0, length 0
12:19:02.945824 IP 10.11.12.2.44378 > 10.11.12.67.4242: Flags [F.], seq 0, ack 1, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.945927 IP 10.11.12.2.44380 > 10.11.12.67.4242: Flags [F.], seq 0, ack 1, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.945989 IP 10.11.12.2.44382 > 10.11.12.67.4242: Flags [F.], seq 0, ack 1, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.946050 IP 10.11.12.2.44384 > 10.11.12.67.4242: Flags [F.], seq 0, ack 1, win 502, options [nop,nop,TS val 4228902972 ecr 6392437], length 0
12:19:02.946135 IP 10.11.12.67.4242 > 10.11.12.2.44378: Flags [R], seq 2491317285, win 0, length 0
12:19:02.946211 IP 10.11.12.67.4242 > 10.11.12.2.44380: Flags [R], seq 1140983017, win 0, length 0
12:19:02.946384 IP 10.11.12.67.4242 > 10.11.12.2.44382: Flags [R], seq 60268326, win 0, length 0
12:19:02.946417 IP 10.11.12.67.4242 > 10.11.12.2.44384: Flags [R], seq 1860058818, win 0, length 0
- Neostat itself is working fine when this happens (checked by manually sending a
{"INFO": 0}
command).
I took a quick look at the neohubapi
source code and observed this integration during normal operation, and I believe it is usually establishing a new TCP connection for every command. This indicates that this weird "stuck" state happened while one of the commands was actively running (e.g. in the middle of Neohub._send) -- perhaps the hub froze while processing the command and restarted, but the integration was still waiting for read or write operation to succeed.
A common way to handle such cases is to abort an operation if it takes too long (more than a given timeout value). I am not super familiar with asyncio
, but it seems that wait_for can be used to implement such timeout. However, I am not sure what would be the best place to add such timeouts:
- This library can have a timeout while calling neohubapi (wrap all calls to
neohubapi
inwait_for
) - neohubapi can have a timeout for all requests (wrap all calls to
_send
inwait_for
) -
neohubapi._send
can have a timeout for connection opening and read/write operations.
Perhaps doing this as high in the stack as possible (#1) would make more sense, but I am not sure whether asyncio will cleanly close open connections in such case.
I would tentatively concur that (1) is most correct. Can you open an issue over at neohubapi and point here?
@stikonas FYI
We have also just released neohubapi 0.5 which include some fixes and command line client. (Maybe command line client can be used to reproduce this problem easier).
And I think I observed that neohub was more stable when opening a new connection for each operation instead of keeping connection open.
But please open an issue on on neohub project, it will be easier to track it.
Sorry, after re-reading and parsing your issue, I think (3) is most correct, sorry :-)
And, thank you so much for the detailed report. I have opened a neohubapi issue: https://gitlab.com/neohubapi/neohubapi/-/issues/6
Yes, I agree. 2) doesn't really make sense, because it would mean more duplicate code. And it's better to solve it in the underlying library than add workarounds to library users.
I've released https://pypi.org/project/neohubapi/ version 0.6. Can anybody affected by this bug try to test it? If it helps, we can again bump minimal version requirement, so that everybody gets update.
I have been using the master
branch of neohubapi
and it mitigated the issue for me.
What I wanted to explore before we switch to 0.6 is whether we can make retries configurable in the custom component. It's now just the matter of passing request_attempts
toNeoHub()
, but I am not sure what's the best way to allow users to do it. Any thoughts?