deconz-rest-plugin icon indicating copy to clipboard operation
deconz-rest-plugin copied to clipboard

Deconz doesn't recognize anymore when a router is offline (ikea bulbs in this case)

Open LeoeLeoeL opened this issue 2 years ago • 11 comments

Describe the bug

If a bulb is switched off by a wall switch, deconz continue to report it online.

Steps to reproduce the behavior

Switch off a bulb by a physical wall switch and wait...........

Expected behavior

I expect to see the device offline as before

Screenshots

image Luce Bagno -1 was shut down 12 hours ago. Luce Cucina D1 & D1 were shut down 4hours ago.

Environment

  • Host system: Raspberry Pi
  • Running method: Raspbian
  • Firmware version: 0x26780700
  • deCONZ version: 2.20.1
  • Device: ConBee II
  • Do you use an USB extension cable: yes
  • Is there any other USB or serial devices connected to the host system? If so: Which? APC Smart-UPS 1500

deCONZ Logs

Additional context

LeoeLeoeL avatar Jan 30 '23 10:01 LeoeLeoeL

What types of lights are they (brand)?

Mimiix avatar Jan 30 '23 10:01 Mimiix

Ikea, but same behaviour have Sonoff ZBMini when I use them for testing pupose and then put in the drawer..

LeoeLeoeL avatar Jan 30 '23 10:01 LeoeLeoeL

It's funny, I'm testing now. Seeing a blue dot when i'm doing any changes on it. After a w hile (now) it's going red but remains available.

  • Testing now the INNR SP120 plug. That one seems to go "offline" after 2 minutes.
  • Hue bulb took 2 minutes (ish)
  • Osram bulbs took 3 minutes (ish

Seems to be isolated at IKEA lights at this moment

Mimiix avatar Jan 30 '23 11:01 Mimiix

Can you check the DDF and see if it's gold? (for the bulbs?)

Mine was "bronze" changed it to gold, did a hot reload and now it seems instantanious

Mimiix avatar Jan 30 '23 11:01 Mimiix

Status is "draft" for both.

LeoeLeoeL avatar Jan 30 '23 11:01 LeoeLeoeL

That's probably the issue. I've asked the devs to check, but it seems to be isolated to bulbs not having a DDF present.

Mimiix avatar Jan 30 '23 11:01 Mimiix

Brand wouldn’t be a determining factor here. Would need to know:

  • Whether light is exposed through DDF or legacy code;
  • Weather light supports attribute reporting.

Unless you try and control a light over the API, the plugin wouldn’t mark reachable false until it has failed to poll the light. For lights with (periodic) attribute reporting, polling would only occur after a periodic report has been missed. I’m not quite sure if the legacy code handles this correctly, but the code for DDF should handle this. With typical periodic reporting every 5 minutes, I would expect up to 6 minutes delay after cutting power to a light. For lights being polled, it would depend on the number of lights in your network.

If memory serves, IKEA lights would be setup using attribute reporting in legacy code. ZHA Hue lights don’t support attribute reporting, and deCONZ only configures attribute reporting (through DDF) for ZB3 Hue lights since v2.20. That would be consistent with the observations above. It would also mean, this will get solved as we move to DDFs, cleaning the legacy code.

Note: using 20th century wall switches for your Zigbee lights is a bad idea. Any logic depending on reachable is a bad idea. See also https://github.com/dresden-elektronik/deconz-rest-plugin/issues/2590.

I’m almost scared to suggest this, but if you (or rather your spouse) insists on using wall switches, you might try and increase the rate of periodic reporting, at least for state/on. I think we use a 5s refresh interval for state/on in most DDFs, but the code handling the DDFs will poll less frequently. Depending on the number of lights, setting periodic reporting every 5 seconds could be a bit much.

ebaauw avatar Jan 30 '23 12:01 ebaauw

  • Whether light is exposed through DDF or legacy code;

In the case of TS: Via legacy code as the DDF was on draft.

In my case: after changing Bronze to Gold (so it used DDF) it was "fixed". So looks like legacy only.

Note: using 20th century wall switches for your Zigbee lights is a bad idea. Any logic depending on reachable is a bad idea. See also #2590.

Also not a favor of "deprecating" as that breaks issues. In my experience so far, there is good reasoning to use it as it is. Mainly because i haven't seen proper and affordable replacements on wallswitches in the Netherlands and proper documentation for group usage. Nevertheless: I believe that discussion is not related to the issue at hand. Apparently we have 2 different "ways" on when a device is marked "unreachable".

Mimiix avatar Jan 30 '23 12:01 Mimiix

The situation now. image Many connections are gone.

LeoeLeoeL avatar Jan 30 '23 14:01 LeoeLeoeL

Many connections are gone.

Please remember that Zigbee doesn't do connections (if it did, it would be easy to clear reachable). The lines represent neightbour table entries of adjacent Zigbee routers. As the device is no longer powered, the routers will expire the entries, and the GUI removes the line when it next queries the neighbour table.

Also not a favor of "deprecating" as that breaks issues.

Deprecating doesn't break anything, as it's purely documentation. Removing reachable would indeed break stuff.

there is good reasoning to use it as it is.

I only linked the issue, because it explains how reachable doesn't reflect whether the device is actually reachable. "As it is" means you'll have to wait several minutes to hours for reachable to be cleared after powering down a device.

Apparently we have 2 different "ways" on when a device is marked "unreachable".

Afaik, there is only one way, see linked issue. The variations in DDF vs legacy and reporting vs polling merely change how quickly deCONZ sends a unicast message to the device and can notice the missing response.

ebaauw avatar Jan 30 '23 15:01 ebaauw

I only linked the issue, because it explains how reachable doesn't reflect whether the device is actually reachable. "As it is" means you'll have to wait several minutes to hours for reachable to be cleared after powering down a device.

Hi

I think this is a massive issue for "normal" users because I expect the Phoscon App also uses "reachable" to show a device not greyed out. But in fact it's completely not showing the truth. Because the device is still working for example if you use the API to turn it on.

I'm using deconz for years now and the most anoying thing is that there is no way a "normal" user can find out or see if a device is currently connected.

Maybe I'm completely wrong about this, but really it is so anoying to see devices greyed out in Phoscon App but they are working absolutely fine! See screenshot:

image

I hope this will be solved at some point and there is a explanation for the reason.

Kind regards Beat

easybeat avatar Nov 30 '23 17:11 easybeat

Going through some older Bug reports for cleanup..

I've just tested with the Ikea GU10 which uses a DDF how it behaves when powering the device physically off.

  • All routers are periodically queried for neighbor table entries (DDF and legacy)
  • The GU10 DDF polls attributes if no reports come in within 30 minutes as per reporting configuration

The reachable attribute was set to false after roughly 30 minutes, and the node in deCONZ as well light in Phoscon App is shown ass offline.


Another test with Philips LWB004 E27 light running on legacy code without a DDF behaves also like that with the difference that the attribute polling is controlled by legacy code which focuses on on/off attribute with some hard set interval. The detection of offline also took roughly 30 minutes.

Regardless of DDF or legacy code, there is a important "lazy" ramp-up to detect reachable devices. If the responses to neighbor tables requests aren't received other commands will be send with APS ACKs enabled automatically for better detection. The lazy here means that this takes time since the periodic neighbor table requests may take 10, 20, 30 minutes... This is mainly to not get tricked by temporary network hickups and work well in larger networks.

Long story short, the detection is in place but can take a while.


The DDFs provide the best option to control the intervals. While we can tweak the "consider offline after x failed responses" down for quicker detection this always has the danger of false positives especially in larger networks.

Closing the issue for now since I consider the lazy detection is the best trade-off for a broad range of networks.

manup avatar Jul 10 '24 09:07 manup