deconz-rest-plugin
deconz-rest-plugin copied to clipboard
Repeated core dumps from deCONZ as Home Assistant add-on
Describe the bug
I am running the deCONZ software as an add-on on Home Assistant. That add-on ones in a while enters a period of repeated crashes (core dumps). I have logged a ticket on the add-on (home-assistant/addons#2442), but given the add-on just wraps the deCONZ software, I've been referred to this project.
Once in a while, the deCONZ add-on enters a period where it constantly dumps core and gets restarted by the supervisor. During that period, the Zigbee devices cannot be controlled. The problem is intermittent. Over the past 90 days, the deCONZ add-on core dumped 687 times in 7 periods.
Steps to reproduce the behavior
At the moment, I do not see a pattern and have no way to trigger this behavior. I can gather extra data when needed. I'm running Prometheus Node Exporter, so I have OS-level metrics, but these do not provide any clues to me.
Expected behavior
No crashes
Screenshots
Not applicable
Environment
- Host system: Raspberry Pi
- Running method: Home Assistent deCONZ Add-on
- Firmware version: 26660700
- deCONZ version: 2.14.1
- Device: ConBee II
- Do you use an USB extension cable: Yes
- Is there any other USB or serial devices connected to the host system? Yes, a Aeon Labs USB Z-Wave Plus Controller
deCONZ Logs
No logs available at the moment
Additional context
Not sure what is relevant in this case.
Hi,
We don't maintain the addon. However, it shouldn't crash. Are you able to provide a core dump?
I think this should be fixed with the upcoming v2.15.3 version, there were two fixes after v2.14.1 related to crashes under certain conditions.
That would be great. I'd love to provide the core dump but have no clue how to extract that from the add-on Docker container.
The Home Assistant community has released an update of the add-on based on v2.15.3 and I've upgraded to that version. I've set an alert to get notified if it crashes again, so I'll keep a close watch. Thanks!
Unfortunately, the issue is not resolved. Yesterday between 14:10 and 14:33, the deCONZ add on had 40 core dumps. The problem is very intermittent: 33 days passed since the previous set of crashes. What can I do to help analyze this issue?
Today, it started crashing at 7:47. So far, it created 100 core dumps in 90 minutes, and counting.
same here, maybe there is something releated with an internet connection? I don't understand why it's working for weeks and suddendly it happens to more people in the world at the same time!
Phoscon server is down. Probsbly has to do with that. We seen it in the past, Manuel wasn't able to figure out what happened.
You can disable discovery with the rest api, then it should stop.
thanks @Mimiix could you please point me how can I disable this by rest API? It could be useful for everybody, thanks!
It got resolved now. The last crash was 10 minutes ago, so it crashed between 7:47 and 10:48 CEST.
ok I guess I did it now from the settings in the Phoscon page on HASS (advances settings, last setting)
It got resolved now. The last crash was 10 minutes ago, so it crashed between 7:47 and 10:48 CEST.
yes same to me
Phoscon is online again.
https://dresden-elektronik.github.io/deconz-rest-doc/endpoints/configuration/#modify-configuration
That's with the rest api.
As there has not been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.
As far as I know, it's just coincidental that this issue didn't occur again in the past 21 days, so please keep this on the backlog and apply a structural fix if possible.
As far as I know, it's just coincidental that this issue didn't occur again in the past 21 days, so please keep this on the backlog and apply a structural fix if possible.
I am not able to put something on the backlog if there's no clear pointer on what goes wrong😅.
Is there anything I can do to gather more information for analysis? I've asked the maintainer of the Home Assistant deCONZ plug-in for the location of the core dump files.
I understand the crashes are caused by unavailability of the Phoscon server. Would it be possible to add some extra logging in that area, to see what happens in case of a connection failure?
The coredumps would probably help. Logging, I am not sure but why not?
The odd thing is that not everyone is affected.
As there has not been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.
This issue is not solved. On July 3rd, it crashed again, this time only once.
Can you share the core dumps?
Same thing was happening to me with the Community Docker container. Got so bad and unreliable I had to take my stick out of use and move the few devices on it over to my other Zigbee hub.
Same thing was happening to me with the Community Docker container. Got so bad and unreliable I had to take my stick out of use and move the few devices on it over to my other Zigbee hub.
Not sure what your comment is contributing here 😅.
I thought maybe someone would want to address that problem and maybe fix it.
We can't without any core dumps themselves to give some pointers 😅
@Mimiix the issue is just about the Phoscon servers. When they went offline we have the issue. Probably you can simulate the same thing, replacing the Phoscon server url with a wrong fake one and you will see the same thing
@Mimiix the issue is just about the Phoscon servers. When they went offline we have the issue. Probably you can simulate the same thing, replacing the Phoscon server url with a wrong fake one and you will see the same thing
Which wrong fake one? I never had this issue in my environment. Manuel can't replicate it either.
I mean since the issue is when the Phoscon are offline, simply debug it putting a fake one like "Phoscon.server" instead of the original URL. So it will cause to fail each time and you should be able to see the issue every time.
I mean since the issue is when the Phoscon are offline, simply debug it putting a fake one like "Phoscon.server" instead of the original URL. So it will cause to fail each time and you should be able to see the issue every time.
This isn't causing an core dump on my side. I use a native deconz install. It simply can't reach the server, but that is the same when I block it on my routers firewall. I never get a crash. Additionally, you can disable the pinging to the discovery server with the rest api.
So again: we can't seem to replicate it and that's why we need the core dumps. We really need user input here in the form of a core dump, otherwise we can't solve it.
As there has not been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.