core icon indicating copy to clipboard operation
core copied to clipboard

Home Assistant boot process gets stuck while loading Rainforest RAVEn integration.

Open rehanson opened this issue 1 year ago • 22 comments

The problem

Home Assistant got stuck in the startup process while loading the Rainforest RAVEn integration. While it was stuck, I wasn't able to access the Overview dashboard because Home Assistant said it was still booting up (I could still navigate to menus and custom dashboards).

Sequence of events:

  • Updated Core, then rebooted
  • Updated some addons (Samba, ssh, whisper) and Supervisor, then rebooted
  • Was unable to navigate to my Home Assistant interface through my web browser. So I rebooted the HA VM through Proxmox
  • Home Assistant interface was reachable (I could navigate the menus and custom dashboards, but not the Overview dashboard) but the following message popup in the UI would not go away:
    Starting Rainforest RAVEn, not everything will be available until it is finished.
  • Unplugged my Rainforest EMU-2 USB cable, waited a few seconds, plugged it in again.
  • The "Starting Rainforest RAVEn" message went away and Home Assistant appeared to be running normally

What version of Home Assistant Core has the issue?

core-2024.3.1

What was the last working version of Home Assistant Core?

core-2024.3.0

What type of installation are you running?

Home Assistant OS

Integration causing the issue

rainforest_raven

Link to integration documentation on our website

https://www.home-assistant.io/integrations/rainforest_raven/

Diagnostics information

config_entry-rainforest_raven-2a01a9ec98f5f6849b766186260f7c24(1).json

Example YAML snippet

No response

Anything in the logs that might be useful for us?

While bootup was "stuck" there were many repeats of this message:
...
2024-03-14 16:55:11.210 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: rainforest_raven
...

------log below occurs around the time I unplugged the EMU-2 USB cable and bootup became "unstuck"------

2024-03-14 16:58:11.355 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: rainforest_raven
2024-03-14 16:58:18.859 WARNING (MainThread) [homeassistant.components.androidtv_remote] Disconnected from Living Room TV at 192.168.1.45
2024-03-14 16:58:18.993 ERROR (MainThread) [androidtvremote2] Device doesn't support sending keys. Try clearing the storage of the Android TV Remove Service system app. On the Android TV device, go to Settings > Apps > See all apps > Show system apps. Then, select Android TV Remote Service > Storage > Clear data/storage.
2024-03-14 16:58:22.254 ERROR (MainThread) [homeassistant] Error doing job: Fatal write error on serial transport
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 621, in write
    n = os.write(self.fd, d)
        ^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 5] I/O error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/serial_asyncio/__init__.py", line 261, in _write_ready
    n = self._serial.write(data)
        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/serial/serialposix.py", line 655, in write
    raise SerialException('write failed: {}'.format(e))
serial.serialutil.SerialException: write failed: [Errno 5] I/O error
2024-03-14 16:58:22.292 WARNING (MainThread) [homeassistant.helpers.system_info] No Home Assistant Supervisor info available
2024-03-14 16:58:34.998 WARNING (MainThread) [homeassistant.components.androidtv_remote] Disconnected from Living Room TV at 192.168.1.45
2024-03-14 16:58:35.134 ERROR (MainThread) [androidtvremote2] Device doesn't support sending keys. Try clearing the storage of the Android TV Remove Service system app. On the Android TV device, go to Settings > Apps > See all apps > Show system apps. Then, select Android TV Remote Service > Storage > Clear data/storage.
2024-03-14 16:58:51.141 WARNING (MainThread) [homeassistant.components.androidtv_remote] Disconnected from Living Room TV at 192.168.1.45
2024-03-14 16:58:51.277 ERROR (MainThread) [androidtvremote2] Device doesn't support sending keys. Try clearing the storage of the Android TV Remove Service system app. On the Android TV device, go to Settings > Apps > See all apps > Show system apps. Then, select Android TV Remote Service > Storage > Clear data/storage.
2024-03-14 16:59:07.285 WARNING (MainThread) [homeassistant.components.androidtv_remote] Disconnected from Living Room TV at 192.168.1.45
2024-03-14 16:59:07.415 ERROR (MainThread) [androidtvremote2] Device doesn't support sending keys. Try clearing the storage of the Android TV Remove Service system app. On the Android TV device, go to Settings > Apps > See all apps > Show system apps. Then, select Android TV Remote Service > Storage > Clear data/storage.
2024-03-14 16:59:23.420 WARNING (MainThread) [homeassistant.components.androidtv_remote] Disconnected from Living Room TV at 192.168.1.45
2024-03-14 16:59:23.553 ERROR (MainThread) [androidtvremote2] Device doesn't support sending keys. Try clearing the storage of the Android TV Remove Service system app. On the Android TV device, go to Settings > Apps > See all apps > Show system apps. Then, select Android TV Remote Service > Storage > Clear data/storage.

Additional information

Even while the Rainforest RAVEn integration is running normally, there are frequent occurrences of "Unavailable" states. This causes the history graph to have many gaps, and dashboards showing the EMU-2 status will often show "Unavailable" instead of the last known value. image

rehanson avatar Mar 15 '24 01:03 rehanson

Hey there @cottsay, mind taking a look at this issue as it has been labeled with an integration (rainforest_raven) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of rainforest_raven can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign rainforest_raven Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


rainforest_raven documentation rainforest_raven source (message by IssueLinks)

home-assistant[bot] avatar Mar 15 '24 01:03 home-assistant[bot]

Thanks for the report, @rehanson. You mentioned that you're using a VM. How is the EMU-2 forwarded from the host to the VM? As a USB device or a serial device?

cottsay avatar Mar 15 '24 01:03 cottsay

Ah, I can see from the diagnostics that it appears in the VM as a USB device.

There were some handle leaks in 2024.2.0-2024.3.0 that I believe were causing a multitude of problems, including the gaps in the history. From what I can tell, the leaked handles would accumulate to the point where Home Assistant was no longer able to maintain a connection at all, resulting in bugs like #111886.

I also observed some problems in my VM after letting 2024.3.0 run to the point where the handle leaks saturated and needed to disconnect and reconnect the device, though I haven't been able to reproduce the problem now that the handle leaks are fixed. Please keep me updated with any changes in behavior.

cottsay avatar Mar 15 '24 01:03 cottsay

@cottsay I just updated to Home Assistant Core 2024.3.3 (from 2024.3.1). This update included the aioraven bump to 0.5.2. I had no issues getting stuck during the boot process after the update. And I haven't had any gaps in the history like I used to get. I've been running for ~3hrs with no issues yet. Before the update, I'd have about one "Unavailable" gap per minute.

rehanson avatar Mar 26 '24 00:03 rehanson

Thank you for reporting back. Glad to hear things are looking better.

We can keep this open for another update cycle or so, but if nobody is seeing it any longer then I'm willing to bet that my initial instinct was right and that qemu was misbehaving after the guest leaked all those unclosed handles. Seems a full VM shutdown or device disconnect are the only ways to get back to a good state when that happens.

It's been a day now. Any errors in the logs? Any chance #113539 stopped occurring too?

cottsay avatar Mar 27 '24 01:03 cottsay

In almost two days of running the latest update, I haven't had any gaps in my history! https://github.com/home-assistant/core/issues/113539 is still occurring however. I will add more comments over on that issue.

EDIT: There were actually four gaps. See a couple of comments down.

rehanson avatar Mar 27 '24 19:03 rehanson

Oh, and I haven't really rebooted enough times to see if the boot process still gets stuck, but based on your explanation, I doubt I will encounter it again.

rehanson avatar Mar 27 '24 19:03 rehanson

Now that I have looked closer at the logs, it did timeout four times (much less of an issue than before!) in the last two days. And I can see 30 second gaps in the history graph at those four points in time.

Logger: homeassistant.components.rainforest_raven.coordinator
Source: helpers/update_coordinator.py:324
integration: Rainforest RAVEn (documentation, issues)
First occurred: March 26, 2024 at 6:48:18 AM (4 occurrences)
Last logged: 2:07:33 AM

Timeout fetching rainforest_raven data

Here are those four occurrences:

2024-03-26 06:48:18.388 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-26 09:59:53.361 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-27 02:05:28.408 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data
2024-03-27 02:07:33.359 ERROR (MainThread) [homeassistant.components.rainforest_raven.coordinator] Timeout fetching rainforest_raven data

rehanson avatar Mar 27 '24 19:03 rehanson

Those are probably legitimate timeouts now. There could be dropped or corrupted messages over the serial link or the EMU-2's wireless link to the meter. There's a lot that can go wrong here.

It's too bad that the data reflects the timeout with a gap like that. We don't want to present stale data, so the only thing I can think to do is to retry the query once or twice. Would it be worth it to add that logic? How much of a problem are the gaps in the data?

cottsay avatar Mar 27 '24 19:03 cottsay

With how rare the timeouts are now, it's not really a problem for me.

rehanson avatar Mar 27 '24 19:03 rehanson

I just ran into the stuck bootup issue again today during a reboot of my Proxmox server (Home Assistant runs as a VM on the server). I am running Core 2024.3.3

Logger: homeassistant.bootstrap
Source: bootstrap.py:601
First occurred: 1:59:34 PM (3 occurrences)
Last logged: 2:01:34 PM

Waiting on integrations to complete setup: rainforest_raven

I was only able to complete HA startup by unplugging the Rainforest EMU-2 meter.

rehanson avatar Apr 03 '24 21:04 rehanson

I just ran into the stuck bootup issue again today...

Rats, thanks for the feedback. I assume this was an update from 2024.3.3 to 2024.4.0?

cottsay avatar Apr 03 '24 21:04 cottsay

Nope, I hadn't yet updated to 2024.4.0 when it happened. (I am currently updating now)

rehanson avatar Apr 03 '24 21:04 rehanson

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

My system still hangs up occasionally when trying to boot. It shows a message about waiting for the RAVEn integration to start.

It most recently happened on 2024.6.4

rehanson avatar Jul 02 '24 22:07 rehanson

FYI @rehanson, 2024.7.3 included changes to the way rainforest_raven handles initialization errors, and the underlying serial library was updated with changes to device closure that are also in this space.

Since I still can't reproduce these problems locally I can't confirm that they will solve or mitigate this problem, but either of those changes could have changed the behavior. Let me know if you see anything different.

cottsay avatar Jul 30 '24 19:07 cottsay

Thanks @cottsay, I'll report back if I see the issue again. (And I'll try to report back in a month or so if I don't see it again)

rehanson avatar Jul 30 '24 21:07 rehanson

I saw this issue again today after upgrading to hassos 13.0, I had reverted the previous update (I think to 12.3 from 12.4) due to this integration not starting. The integration ran fine with the older hassos. After I saw the hang on 13.0 I tried unplugging the rainforest device and re-plugging and that seemed to work, startup completed quickly and I was getting power data. I did a second full reboot to test and had the same symptoms. Restarting just HA after this (no reboot) and everything came up normally so this seems to only occur on full OS reboots for me.

hassos: "13.0" homeassistant: 2024.8.1 machine: raspberrypi4-64

tsaro avatar Aug 15 '24 17:08 tsaro

I'm still seeing the same behavior after upgrading hassos 13.1, unplugging the rainforest device after a full OS reboot restored data, no issues with subquent HA restarts

tsaro avatar Aug 26 '24 16:08 tsaro

I just ran into this again after upgrading to HA OS 13.1. The initial reboot after upgrading stalled without me being able to get to the Web UI, so I rebooted again, was able to get to the Web UI, but with the dreaded message again: Starting Rainforest RAVEn, not everything will be available until it is finished.

It finished booting when I unplugged the EMU-2.

Core: 2024.8.2

rehanson avatar Aug 29 '24 19:08 rehanson

I have also hit this, on Home Assistant 2024.10.1, running in docker, with the EMU-2 exposed into the docker container as /dev/ttyACM0. The EMU-2 is actually connected to another raspberry pi, forwarding it over the network via usbip. I went to test the power loss recovery on the remote device, and when it came back online, understandably, RAVEn wasn't connected, so I restarted the HASS container and hit this issue on startup.

rossica avatar Oct 08 '24 06:10 rossica

just upgraded to hassos 13.2 and same issue/workaround, hung at the Rainforest integration trying to start (waited 10 minutes before unplugging and re-plugging USB, then data started flowing again)

tsaro avatar Oct 15 '24 21:10 tsaro

I started running into this as well. I had to remove the device from the VM, add it again, and restart the RAVEn integration.

mdeneen avatar Oct 17 '24 00:10 mdeneen

Is there a forum that we can ask question about problems we see with this integration? I don't want to open a bug just becasue I am the only one seeing it.

alexsoh avatar Dec 25 '24 08:12 alexsoh

I see this issue also; running HAOS inside a Proxmox VM with a Rainforest EMU-2 that's passed to the VM using it's USB vendor/device ID. I'm happy to help troubleshoot if that's helpful..

shrug1 avatar Dec 30 '24 23:12 shrug1

I see this issue also; running HAOS inside a Proxmox VM with a Rainforest EMU-2 that's passed to the VM using it's USB vendor/device ID. I'm happy to help troubleshoot if that's helpful..

I have the same setup. I'd be happy to test as well.

geoffpasley avatar Mar 19 '25 17:03 geoffpasley

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

this still occasionally happens to me

PhantomPhoton avatar Jun 17 '25 18:06 PhantomPhoton

this still occasionally happens to me

same for me

geoffpasley avatar Jun 17 '25 19:06 geoffpasley

Joining in the replies to keep the issue open. This is still a 100% fail item for me unless I unplug the Rainforest device then reboot the entire Proxmox node (not just the VM containing Home Assistant).

shrug1 avatar Jun 17 '25 19:06 shrug1