zigbee2mqtt
zigbee2mqtt copied to clipboard
Mass device disconnection and network collapse
What happened?
In the last month I noticed that about once a week my network (45 devices) starts to have some offline devices. Usually, some days or hours after this issue, the whole network collapses, with most of the devices going offline.
Right now I usually manage to fix this issue by restarting Z2M, however sometimes not all the devices come back online and I have to power-cycle them.
What I've noticed from the web ui is that usually the first devices to go offline are the same. First a specific MS-104BZ Moes Relay, then other devices that are nearby, then the whole network.
I checked the debug log, however I didn't find anything that suggests some particular issue.
I'm in the process of setting up a CC2351 sniffer to provide more detail (I'm waiting for the adapter to arrive), so I'll update this issue asap.
What did you expect to happen?
The network should remain stable, without devices disconnecting or without a mass disconnection occurring.
How to reproduce it (minimal and precise)
No response
Zigbee2MQTT version
1.32.2-dev commit: 1439f4c
Adapter firmware version
20220726
Adapter
Sonoff βZBdongle-P
Debug log
why not try remove the Moes relay, and see if your network is more stable?
I've followed your suggestion, however today I saw three other router devices disconnected in the UI (and they don't come back online after hours). Luckily I've been sniffing all the traffic this night, so I managed to get some data, if it can be of any use.
I have a similar problem. From time to time, some devices disappear from the network. Restarting Z2M helps. Version: 1.32.2-1 Coordinator revision 20210708
I updated the coordinator firmware to 20230507; I also moved to 1.33-dev, however these issues still arise.
I'm suspecting it might be some sort of congestion of the network due to some devices being the only way to reach the coordinator. Not sure about this, since the map shows 60-90 LQI on average, however I will try to place some CC2652 repeaters to improve routing.
@Koenkk by any way is there the possibility to get a map of how the traffic is routed through the network? Like seeing which device is being used as a parent? Thanks!
Sounds like one routing collapse in the mesh. If having OSRAM router put them in the black box of bad Zigbee devices. Most devices with Sirlabs chips is having one bug that is crashing the Zigbee stack if getting many parent announcement and need being re-powered for working OK. Most IKEA and tuya and new HUE device is having this bugs and many more that is using the Silabs chips and have not updating the firmware for fixing it.
Sounds like one routing collapse in the mesh. If having OSRAM router put them in the black box of bad Zigbee devices.
Luckily I don't have any osram devices :)
Most devices with Sirlabs chips is having one bug that is crashing the Zigbee stack if getting many parent announcement and need being re-powered for working OK. Most IKEA and tuya and new HUE device is having this bugs and many more that is using the Silabs chips and have not updating the firmware for fixing it.
I'm checking the devices I'm using, right now I opened:
- Vimar 03981: uses the nRF52840. Don't know if they're doing something wrong but I removed a couple of them
- Moes Smart relay 2 gang: uses a ZS2S module from tuya, which has a EFR32MG21A020F768IM32 from SiLabs. May be one of the causes; I removed both of them from the network (and powered them off)
- NodOn SIN-4-20: not sure if I can open this without destroying it... Anyway I am using them a lot
In the house I added three repeaters with sonoff dongles P, let's see if the modules help..
@stefa168 Can you posting the IEEE of one NodOn SIN-4-20 (or the 6 first numbers) so can lockup the vendor of the chip ? MG21 and nRF chips is normally good ones also look if you have updates fr the P-Routers then some old is having problems manager there children.
@MattWestb sure thing, here are all the SIN-4-20 that I'm using right now:
- 0x540f57fffe44a2d9
- 0x9035eafffec889a0
- 0x9035eafffec88da3
- 0x540f57fffe47ea82
- 0x540f57fffe44ae79
- 0x540f57fffe47ebf6
- 0x5c0272fffe7d1a5a
Regarding the P-routers, I flashed them last week and introduced them to the network, so I still can't determine whether this is a LQI issue or something else.
I added the repeaters since the building where I live is very old and has 60cm (full brick) thick walls... however it is very strange to me that the devices can't communicate through one of these walls...
All is Silabs Chips and normally working well if the devs not have making some strange things in the firmware.
RF (Radio Frequencies) can penetrating, absolved or being reflected and is depending of the material (density and electrical lighting and magnetic shedding) and the Frequencies if / how well its doing it and its to long writing here but its try and error for getting it working OK and some time one reflected signal can being better then one direct if its being "clean".
Got it, thanks! With the repeaters it seemed to work better. I reintroduced the two Moes relays some days ago, however a couple of minutes ago I checked the status of the network and it was all down again. The only devices that were online were the ones that usually connect directly to the coordinator, and only the battery-powered ones.
This time a restart did the trick for all of them except two of the nodon devices that have a newer firmware. I contacted the manufacturer some days ago and they should do some checks..
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
For now not a lot has happened, the repeaters look to have done the trick; I still get a couple of disconnections with the updated nodon relays and sometimes with the two Miboxer FUT039Z that I have. The miboxer easily reconnect with a powercycle, but the nodon relays don't. I still have to receive an answer from the manufacturer.
All my devices disappeared today.
I had to restore the database.db from a backup.
Zigbee2MQTT-Version [1.33.2] Coordinator-Typ EZSP v8 Coordinator-Version 6.7.10.0 build 423 Coordinator IEEE Adresse 0x60a423fff.... Zigbee-herdsman-converters version 15.106.0 Zigbee-herdsman version 0.21.0
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days