core Introduce a delay between update entity calls

trafficstars

Breaking change

Any skipped updates in the zwave_js update entities will be cleared when you first upgrade your HA instance to this version. This will be a one time occurrence per update entity, the integration will persist the state going forward.

Proposed change

It was discovered that in some cases, because of the way we were handling update entity updates, we were causing floods of network traffic. The reason is because even though we used a semaphore to limit parallel requests, as soon as the call was done the next one immediately started. For large networks, at startup, and every 24 hours after, we would generate a lot of traffic which ended up causing bit flips.

In this logic, I used balloobs idea to introduce a 5 minute delay before releasing the lock (we are now limiting it to a single update at a time) which will space out the network requests and the subsequently scheduled updates.

I also fixed a bug where we weren't properly unsubscribing from the callback.

CC @AlCalzone @kpine

Type of change

[ ] Dependency upgrade
[x] Bugfix (non-breaking change which fixes an issue)
[ ] New integration (thank you!)
[ ] New feature (which adds functionality to an existing integration)
[ ] Deprecation (breaking change to happen in the future)
[ ] Breaking change (fix/feature causing existing functionality to break)
[ ] Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

[ ] The code change is tested and works locally.
[ ] Local tests pass. Your PR cannot be merged unless tests pass
[ ] There is no commented out code in this PR.
[ ] I have followed the development checklist
[ ] I have followed the perfect PR recommendations
[ ] The code has been formatted using Black (black --fast homeassistant tests)
[ ] Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

[ ] Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

[ ] The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
[ ] New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
[ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
[ ] Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

[ ] I have reviewed two other open pull requests in this repository.

Mar 15 '23 10:03 raman325

Hey there @home-assistant/z-wave, mind taking a look at this pull request as it has been labeled with an integration (zwave_js) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zwave_js can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign zwave_js Removes the current integration label and assignees on the pull request, add the integration domain after the command.

Mar 15 '23 10:03 home-assistant[bot]

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks :+1:

Learn more about our pull request process.

Mar 15 '23 13:03 home-assistant[bot]

We should test this or write a test for it. Not sure how to do the latter though as we don't use the Home Assistant event helpers for the update delay.

Mar 16 '23 03:03 MartinHjelmare

We should test this or write a test for it. Not sure how to do the latter though as we don't use the Home Assistant event helpers for the update delay.

do we want to switch to using the event helpers? I thought about it, it adds a little complexity but it would make writing a test easier which I think is preferred

Mar 16 '23 04:03 raman325

It would be good to write a test, yes. With the event helper approach, will you calculate the schedule time depending on the number of nodes and schedule all updates in one go?

Mar 16 '23 09:03 MartinHjelmare

No I was planning to use the call later helper as an exact replacement for asyncio.sleep. The complexity I was referring to was just having another callback to manage, unsub, etc. The code is just easier to read in its current form, but for this little added complexity I can add a test where I can verify that only one update happens before 5 minutes after start

Mar 16 '23 21:03 raman325

Ok. I don't understand how the lock will work with that approach, but I'll take a look when you push.

Mar 17 '23 06:03 MartinHjelmare

OK so switched to the helper. Two problems:

I can't figure out how to test this in tests. I tested it on my instance and it successfully staggered the firmware updates 5 minutes at a time
Because most of the updates are waiting to acquire the lock, the task never gets canceled. Not sure how to address this but this would have been a problem in either instance 2023-03-17 22:41:48.116 WARNING (MainThread) [homeassistant.core] Task <Task pending name='Task-1762' coro=<ZWaveNodeFirmwareUpdate._async_update() running at ./homeassistant/components/zwave_js/update.py:174> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[set.remove()]> was still running after stage 2 shutdown; Integrations should cancel non-critical tasks when receiving the stop event to prevent delaying shutdown

Mar 18 '23 02:03 raman325

maybe I need to create the HassJob myself for the update and then add a listener for a stop event to cancel the job?

Mar 18 '23 02:03 raman325

OK so I think my new solution avoids the task problem and removes the need for a lock entirely. Basically for every entity we add, we increment a counter which we use to determine the initial delay. Because we can't guarantee that hass is running during the first run, we just push the run to 24 hours later so that we preserve the 5 minute delays

Mar 18 '23 03:03 raman325

It's not a breaking change anymore, right?

Mar 20 '23 21:03 MartinHjelmare

It's not a breaking change anymore, right?

nope, fixing that

Mar 20 '23 21:03 raman325

Maybe also update the PR description for the latest iteration of approach.

Mar 20 '23 21:03 MartinHjelmare

core core copied to clipboard

Introduce a delay between update entity calls

Breaking change

Proposed change

Type of change

Additional information

Checklist

core
core copied to clipboard