operating-system icon indicating copy to clipboard operation
operating-system copied to clipboard

Restart issues on 8.0.rc1-4 - System seems to hang

Open bschatzow opened this issue 3 years ago • 20 comments
trafficstars

Describe the issue you are experiencing

I am having restart issues after upgrading from 7.6. At first I thought there was an issue with the upgrade. Now it seems for some reason the PI is not restarting properly. After the update the PI4 attached screen is blank (also can't access it from SSH). If I removing power for a few seconds and then plug power back in the PI starts with no issue. It is not an update issue as if I do a host restart it hangs.

Never saw this before 8.0.

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

8.0 RC4

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Update from 7.6 to OS 8.0 RC 1, 2, 3, 4
  2. Or after the update do a system restart
  3. The PI shuts down but it does not restart. The attached screen goes blank.

...

Anything in the Supervisor logs that might be useful for us?

N/A

Anything in the Host logs that might be useful for us?

N/A

System Health information

N/A

Additional information

Worked with @agners on 5/9/22, made changes to the config.txt file and found out if power is remove from the PI4 for 10 seconds the issue is gone but still can't do a restart without removing power.

bschatzow avatar May 06 '22 18:05 bschatzow

I believe I might be suffering from the same issue. After the upgrade, my RPI4 doesn't seem to start again. What changes where made to config.txt?

ggravlingen avatar May 08 '22 11:05 ggravlingen

The change was uncomment the statement that deals with uart=1. I don't think this is the fix as Stephan had me do this in preparation of hooking it up to a PC with a reader attached to the pi. I found removing power for 10 seconds has always allowed it to start. Do you have a monitor attached? Mine when not booting would be blank.

bschatzow avatar May 08 '22 11:05 bschatzow

I'm running headless and using a PoE hat. I don't have a power adapter so I think I'll just do a reinstall of the OS. It is after all a rc so can't expect it to be super stable yet.

ggravlingen avatar May 08 '22 12:05 ggravlingen

Did you try removing power?

bschatzow avatar May 08 '22 12:05 bschatzow

Did you try removing power?

Yes, I pulled power for 10+ seconds but there was no difference I'm afraid.

ggravlingen avatar May 08 '22 12:05 ggravlingen

Mine has always come back. Restore is now easy so you can go that route. Also you may try the update from command line next time?

bschatzow avatar May 08 '22 12:05 bschatzow

@agners - I just updated to 8.0 released and had the same issue as the RCs. I monitored my display as well as through SSH and pinging. The install worked fined and saw the display do the reboot which shutdown the ssh and pinging. When it came back up the display stayed blank but ping came back (no SSH). Removed power for 10 seconds, restored power and all working again.

bschatzow avatar May 16 '22 09:05 bschatzow

but ping came back (no SSH)

Uh, that is very special, sounds like the OS is booting (which would mean Raspberry Pi Firmware + Bootloader is fine), but then gets stuck booting for some reason. It is weird that display doesn't come up either, so maybe the display is involved in this (maybe graphics driver crashes?).

Few things would be interesting:

  • Can you reproduce this when installing 8.0.rc4 and not restoring your data, then upgrade to 8.0
  • Can you reproduce this when installing 8.0.rc4 and not connecting the monitor, then upgrade to 8.0

I did some testing with SSD before releasing 8.0, and at least in the combination StarTech USB3S2SAT3CB (PID:VID 14b0:0206) connected to a Intel SSD 520 Series 240GB looked fine.

agners avatar May 16 '22 11:05 agners

@agners another user above had the same issue. What is strange is that the system on the restart (without) unplugging does not show anything, pi info, etc. I don't think it has anything to do with what I have loaded as this happens between the restart and start. The monitor should be loaded early in the process. This issue first started for me with 8 rc 1. Maybe a better test would be go back to 7.6 and if this works it points to an issue with something that changed between the two. I can try this again if you think there is a benefit. Let me know.

bschatzow avatar May 16 '22 14:05 bschatzow

@agners I did not close it

bschatzow avatar May 16 '22 14:05 bschatzow

@agners - Just tried 8.1 same issue but needed to turn off my SSD as well as the pi to restart. Maybe the issue is not with the Pi but with the port hanging? I have a powered hub that I never turn off. Is there something in the log files to try and figure out if this is my issue?

bschatzow avatar May 20 '22 18:05 bschatzow

Maybe a better test would be go back to 7.6 and if this works it points to an issue with something that changed between the two.

Pretty much everything changed between the two: New kernel new Raspberry Pi firmware etc.

I don't think it has anything to do with what I have loaded as this happens between the restart and start.

Maybe it doesn't crash at startup, but at very late shutdown (e.g. it turns off the display etc, and then "hangs" for some reason). That might be influenced by something running on OS.

Just tried 8.1 same issue but needed to turn off my SSD as well as the pi to restart.

Is your SSD separately powered? Through separate power supplies? I'd always turn off everything, otherwise the SSD might be in some intermediate state. Maybe that is the problem, that the new Raspberry Pi firmware does not reset the SSD properly. But that should be reproducible with Raspberry Pi OS, and then reported to the Raspberry Pi firmware team.

agners avatar May 23 '22 08:05 agners

@agners Is your SSD separately powered? Through separate power supplies?

I have the powered hub as this was recommended during the 1119 issue to eliminate any power supply issues. On next update I will power down the hub prior to restart Not sure how to address this with the raspberry pi community?

bschatzow avatar May 23 '22 16:05 bschatzow

Yeah right, that setup should be working indeed. I forgot the powered USB hub options.

USB has its own reset signal, so I'd assume that should be enough to get the SSD into a safe state.

agners avatar May 23 '22 20:05 agners

Seems others are having boot issues with the powered hub. https://raspberrypi.stackexchange.com/questions/100582/raspberry-pi-does-not-reboot-with-powered-external-hdd-attached I had no issues prior to 8.

bschatzow avatar May 24 '22 22:05 bschatzow

Think I have the same issue. Similar setup than @bschatzow has, but my SSD is directly connected to USB port of the pi4 without any separate power hub.

TuomasPakkanen avatar May 25 '22 08:05 TuomasPakkanen

@agners another link

https://forums..com/viewtopic.php?t=245218&sid=67484e3aefa1d1d929d601b82372a433 

Seems many have issues with the powered hub. Some of the comments seem to conclude that its related to updated firmware.

bschatzow avatar May 26 '22 00:05 bschatzow

Just updated to 8.2. Sill an issue. The easiest restart for me was unplug the SSD, restart the pi , shut down the pi, plug in the SSD and turn on all.

bschatzow avatar Jun 09 '22 11:06 bschatzow

Just updated to 8.3. To restart I needed to unplug the SSD, restart the pi, power off the pi, plug in the ssd and then restart all.

bschatzow avatar Jul 08 '22 00:07 bschatzow

@agners I just updated to 8.4 and the restart worked with no issue. Was anything changed that fixed this?

bschatzow avatar Jul 24 '22 01:07 bschatzow

Just updated to 8.5 and restart hung. Restarted a couple of times and it is very slow with a lot of:

22-08-16 16:35:00 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:

errors.

bschatzow avatar Aug 16 '22 20:08 bschatzow

@bschatzow your issue seems very random. Any data from a single update/restart is not representative.

My best bet is still some race condition in USB SSD enumeration in the Raspberry Pi firmware. Unfortunately not much I can do about it.

agners avatar Aug 17 '22 07:08 agners

@agners was something changed from 8.3 to 8.4 and back in 8.5. 8.4 worked with no issue for me. The other two failed. The last one 8.5 left my system corrupt and I needed to do a restore.

bschatzow avatar Aug 17 '22 10:08 bschatzow

8.3, 8.4 and 8.5 use the very same Raspberry Pi kernel and the very same Raspberry Pi firmware. There are some common changes, like new security updates introduced by Buildroot update. But those are on all machines, and didn't show problems on other systems.

What you are seeing is some local inconsistency, which seems to be caused by reboots.

System temperature can affect silicon in subtile ways: PLL's might have longer to lock, which can change timings etc. E.g. I could imagine that a OS update causes some CPU/SSD controller traffic, which causes the chips to warm up ever so slightly (e.g. 5-10°C). This then might affect behavior of your next reboot....

In the end, I do don't know. I don't see a lot of RPi SSD issues these days, so I think its really your specific hardware setup which causes troubles.

agners avatar Aug 17 '22 11:08 agners

@agners I agree that all of these can cause an issue but there was no issue for me rebooting prior to 8.0 rc1. 8.5 for some reason corrupted my system and I had to restore a backup. This may have been operator error and I restarted prior to it completing. I'll try again and take better notes. Also I believe a shutdown, restarts works. I'll try to confirm this was well.

bschatzow avatar Aug 17 '22 18:08 bschatzow

@agners for some reason I only received a notification that 8.5 was available this morning to try again. I noticed several things. Once the installation is started the desktop disconnects which seems normal. The next thing is the SSH connection shows HA is shutting down on any command issued. The monitor shows no change. Then the ssh window loses it's connection and the monitor goes blank. I assume this is when it reboots? No message. It would be helpful if something was showing what is going on.. I removed power and all seemed to come back up correctly.

The first error I see is

22-08-23 06:05:17 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 435, in _handle_request
    resp = await request_handler(request)
  File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/aiohttp.py", line 121, in sentry_app_handle
    reraise(*_capture_exception(hub))
  File "/usr/local/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 54, in reraise
    raise value
  File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/aiohttp.py", line 111, in sentry_app_handle
    response = await old_handle(self, request)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_app.py", line 504, in _handle
    resp = await handler(request)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_middlewares.py", line 117, in impl
    return await handler(request)
  File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 138, in system_validation
    return await handler(request)
  File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 204, in token_validation
    return await handler(request)
  File "/usr/src/supervisor/supervisor/api/utils.py", line 60, in wrap_api
    answer = await method(api, *args, **kwargs)
  File "/usr/src/supervisor/supervisor/api/store.py", line 177, in store_info
    ATTR_ADDONS: [
  File "/usr/src/supervisor/supervisor/api/store.py", line 178, in <listcomp>
    self._generate_addon_information(self.sys_addons.store[addon])
  File "/usr/src/supervisor/supervisor/api/store.py", line 113, in _generate_addon_information
    ATTR_ADVANCED: addon.advanced,
  File "/usr/src/supervisor/supervisor/addons/model.py", line 222, in advanced
    return self.data[ATTR_ADVANCED]
  File "/usr/src/supervisor/supervisor/store/addon.py", line 19, in data
    return self.sys_store.data.addons[self.slug]
KeyError: 'df843657_spotify'

Then I see a lot of this is taking longer than 10 second errors

2022-08-23 06:10:46.872 WARNING (MainThread) [homeassistant.components.binary_sensor] Setup of binary_sensor platform synology_dsm is taking over 10 seconds.

Also

22-08-23 06:05:58 ERROR (MainThread) [supervisor.homeassistant.api] Error on call https://172.30.32.1:8123/api/config:

None of these were here on 8.4 To me it looks like 8.5 update corrupted my system. Any ideas?

bschatzow avatar Aug 23 '22 10:08 bschatzow

@agners 9.0 RC 1 works for me with no issue. First one that has reboot since 8.0rc1.

bschatzow avatar Sep 03 '22 10:09 bschatzow

@agners Not sure what has changed but both RC1 and now RC2 work for me. I am going to close the issue as I no longer have a problem. Would be nice to know what changed that now has it working?

bschatzow avatar Sep 09 '22 00:09 bschatzow

None of the critical components changed in 9.0.rc1 (was the same Raspberry Pi firmware and Linux kernel). However in 9.0.rc2 the kernel and Raspberry Pi firmware got upgraded.

That said, I think your issue is intermittent. Quite possible it comes back suddenly. We'll see.

agners avatar Sep 13 '22 13:09 agners

@agners I'll let you know. I have had some strange issues that few others have seen. I'll let you know if this problem comes back. It is strange that a host shutdown or reboot never had this issue. Only an OS update. The released 9.0 also worked with no issue.

bschatzow avatar Sep 13 '22 16:09 bschatzow