core icon indicating copy to clipboard operation
core copied to clipboard

Rainbird failing

Open mikesalz opened this issue 1 year ago • 20 comments

The problem

For at least the last several days, the Rainbird integration has been failing. Possibly since 2023.1.4, but maybe earlier

What version of Home Assistant Core has the issue?

2023.1.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

rainbird

Link to integration documentation on our website

https://www.home-assistant.io/integrations/rainbird/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-01-17 17:29:54.398 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')
2023-01-17 18:00:53.300 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')
2023-01-17 18:00:54.392 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')
2023-01-17 18:01:05.166 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')
2023-01-17 18:12:05.307 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')
2023-01-17 18:12:14.354 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Timeout fetching Rainbird Zones data
2023-01-17 18:17:34.355 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Timeout fetching Rainbird Zones data
2023-01-17 18:49:53.257 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')
2023-01-17 18:49:54.355 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Timeout fetching Rainbird Zones data
2023-01-17 18:50:05.169 ERROR (MainThread) [homeassistant.components.rainbird.coordinator] Error fetching Rainbird Zones data: Error communicating with API: Error from API: 503, message='Service Unavailable', url=URL('http://192.168.1.26/stick')

Additional information

No response

mikesalz avatar Jan 18 '23 01:01 mikesalz

Hey there @konikvranik, @allenporter, mind taking a look at this issue as it has been labeled with an integration (rainbird) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of rainbird can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Change the title of the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign rainbird Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


rainbird documentation rainbird source (message by IssueLinks)

home-assistant[bot] avatar Jan 18 '23 01:01 home-assistant[bot]

I definitely did rewrite the connection logic, however for me it had the opposite effect of making everything more robust and quieter to have a single polling happening.

One difference is a change in behavior in retirees, and another is synchronizing requests to be sent at the same time though they are now in parallel which could be a problem.

To confirm; you aren't using the rainbird app at the same time right? I'm assuming not.

allenporter avatar Jan 18 '23 03:01 allenporter

Correct. I rarely use the app. I did open it today to make sure the wifi is working, after I saw the error in the HA log. Is there anything else I can send you to help debug?

mikesalz avatar Jan 18 '23 03:01 mikesalz

If you get debug logs from the client then it may inform. Using logger:

logger:
  default: info
  logs:
    homeassistant.components.rainbird.coordinator: debug
    pyrainbird.async_client: debug

My generally impression is that the device returns 503 Service Unavailable when it is busy.

One thing I am confused by is why you're seeing this so often. My impression was that you should only see this if the device ping pongs between healthy and not-healthy, so it implies some requests are succeeding in between. I am confused by that because the polling time is every minute.

The debug info may give us insights to be able to answer those questions.

allenporter avatar Jan 18 '23 04:01 allenporter

I added the logging you recommended and restarted HA. Please see the attached. I tried to strip out everything unrelated to the Rainbird integration. I'm not sure that I see anything in there that will be helpful, but maybe you'll find something I didn't. Thanks for your help!

Log.txt

mikesalz avatar Jan 18 '23 22:01 mikesalz

OK so I realize now my changes from https://github.com/home-assistant/core/pull/85271 have not yet made it into the release. I forgot that I did some of the rewrites in stages.

That log shows it fails once at 17:06:51 then succeeds at 17:06:51 right away, then 17:07:14, 17:07:51, 17:07:51, 17:08:14it fails once and then succeeds a bunch so not super actionable.

But its running all these separate requests, once per platform. It queries separately for the switch platform, binary sensor platform, and sensor platform. In the new PR, it queries them all with a single coordinator and the requests either succeed or fail.

allenporter avatar Jan 19 '23 02:01 allenporter

I think it maybe would be interesting to let the logging run a big longer and capture more of the failure problems.

allenporter avatar Jan 19 '23 02:01 allenporter

Hi @allenporter - I am including a bigger chunk of the log now. Again, I tried to chop out the non-Rainbird stuff. But it was becoming too tedious. So what is attached is not the entire log, but it is a big chunk of it. It looks like the same thing over and over. Log 2.txt

mikesalz avatar Jan 19 '23 17:01 mikesalz

I think what we should do is see if the 2023.2 release improves things and if not I will submit a patch to serialize the rpcs or add more retries. Apologies for the disturbance but I'll help improve the reliability as much as I can.

allenporter avatar Jan 21 '23 04:01 allenporter

I took a risk on removing the retry behavior thinking it was masking another fundamental issue, so hoping we can work through that.

allenporter avatar Jan 21 '23 04:01 allenporter

No worries. I appreciate all of your help! I usually don't install the x.0 releases, but I will update to 2023.2.1 when it is released and let you know how everything goes.

mikesalz avatar Jan 21 '23 16:01 mikesalz

Do you know if with the new version that will able to use the HA integration and the Rainbird IOS app in the same time or in //?

plouf34 avatar Feb 02 '23 05:02 plouf34

The rain bird device returns a 503 error when multiple apps talking to it. It doesn't seem to matter who the requester is.

However, in practice, I don't find I have a problem when occasionally opening the rainbird app and using home assistant.

One change I made was to send all the home assistant rainbird requests in sync together, rather than having 4 different polling requests happening at different times. I am hoping this will help.

One other thing is the rainbird poll time is once every minute. (This seems fairly high, but makes sense while you are in the middle of controlling the device)

allenporter avatar Feb 02 '23 06:02 allenporter

Hi @allenporter - Just FYI, I am now on Home Assistant 2023.2.1 and am still getting this error quite frequently. 10 times since yesterday afternoon. It's probably not a big deal, but just wanted to let you know.

mikesalz avatar Feb 04 '23 13:02 mikesalz

Hi, ok. Do you mind sending an updated log? It would be helpful to under the request pattern and the specific details.

Just to clarify again, there are no other devices connecting to rainbird at the same time?

allenporter avatar Feb 04 '23 23:02 allenporter

To elaborate: I used to regularly see tons of error messages talking to the rainbird device. Now I never see any errors anymore since the integration was changed synchronize its calls to the device using home assistant best practices for sending RPCs, rather than sending from different sensors at random times. I'm hoping we can improve this and get everyone into a similar state. Having the lower level details from logs would really help understand why that isn't working for others.

allenporter avatar Feb 05 '23 15:02 allenporter

Hi @allenporter I appreciate your dedication to this project! I will re-enable fuller logging and then let it run for a while so I have something worth sending. Should I remove the non-rainbird related logs? I feel a little weirded out by sharing the entire thing, but if there is nothing personal or identifiable in there, I can send it.

mikesalz avatar Feb 05 '23 16:02 mikesalz

Yes filtering by rainbird would be helpful.

allenporter avatar Feb 05 '23 16:02 allenporter

Good day.

I have a rain bird that has 24 channels. Apparently this integration only supports the first 16 channels.

Since upgrading to the newest release I get this error.

Retrying setup: tuple index out of range

Previously I was unable to control any channel above 16 as well.

Gert

Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 239, in _async_refresh self.data = await self._async_update_data() File "/usr/src/homeassistant/homeassistant/components/rainbird/coordinator.py", line 82, in _async_update_data return await self._fetch_data() File "/usr/src/homeassistant/homeassistant/components/rainbird/coordinator.py", line 96, in _fetch_data active_zones={zone for zone in zones if states.active(zone)}, File "/usr/src/homeassistant/homeassistant/components/rainbird/coordinator.py", line 96, in active_zones={zone for zone in zones if states.active(zone)}, File "/usr/local/lib/python3.10/site-packages/pyrainbird/data.py", line 114, in active return self.states[number - 1] IndexError: tuple index out of range

DamHuisGithub avatar Feb 06 '23 12:02 DamHuisGithub

Hi, can you file a separate issue? We'll need to discuss that separately. The intent was to get all channels working, so we can fix that.

allenporter avatar Feb 06 '23 15:02 allenporter

@DamHuisGithub I believe this may have fixed that error since it was rewritten, however I would still like to get details about improving support for 22 zones. I don't have a device with that many zones so we can get your debug data and improve it.

allenporter avatar Feb 07 '23 04:02 allenporter

Hi Allen I will support you in any possible way to get this working. Please let me know what you require for testing purposes and I'll executed on my side and send you the results. I still get the following error: Thank you for this integration. Gert

Unexpected error fetching 10.10.51.247 data: tuple index out of range Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 239, in _async_refresh self.data = await self._async_update_data() File "/usr/src/homeassistant/homeassistant/components/rainbird/coordinator.py", line 82, in _async_update_data return await self._fetch_data() File "/usr/src/homeassistant/homeassistant/components/rainbird/coordinator.py", line 96, in _fetch_data active_zones={zone for zone in zones if states.active(zone)}, File "/usr/src/homeassistant/homeassistant/components/rainbird/coordinator.py", line 96, in active_zones={zone for zone in zones if states.active(zone)}, File "/usr/local/lib/python3.10/site-packages/pyrainbird/data.py", line 114, in active return self.states[number - 1] IndexError: tuple index out of range

DamHuisGithub avatar Feb 07 '23 10:02 DamHuisGithub

@DamHuisGithub sorry to clarify I was saying we should discuss in a separate issue not this one. (The error message is not the same).

The next patch release should fix the error but I want to still followup and confirm the stations can work and discuss more debugging data.

allenporter avatar Feb 07 '23 14:02 allenporter

@allenporter I know I promised you logs, but it looks like 2023.2.3 fixed the issues. I installed this morning and have not had a single error since. So a HUGE thank you for your help!

mikesalz avatar Feb 08 '23 19:02 mikesalz

No problem, someone else gave me some. Glad it's rock solid now!

allenporter avatar Feb 08 '23 20:02 allenporter