node-zwave-js icon indicating copy to clipboard operation
node-zwave-js copied to clipboard

META Issue: Communication problems on v10.x

Open AlCalzone opened this issue 3 years ago • 69 comments

It looks like there were some changes in v10, which break communication with some devices. We're still in the process of figuring things out and need some information to do so.

If devices don't work (fully or partially) after upgrading, first re-interview them.

If they still don't work, please share the following information (attached as files):

  1. a node dump
  2. a driver log (loglevel "silly") of a re-interview
  3. a driver log (loglevel "silly") of the actions that don't work
  4. What does not work?

Comments that just say "me too" or similar and don't provide information will be deleted.

AlCalzone avatar Sep 01 '22 07:09 AlCalzone

Hi, I have many devices not working after this update and attached is log No. 2 and 3 requested above. Where do i create a node dump? Logs above is specifically one device, Node 012 in my setup.

  1. zwave_js-silly-Node012_re-interview.log

  2. zwave_js-silly-Node012_lightsOn_NotWorking.log

  3. The status of my dimmers updates in HA when I press the switches on the wall, but nothing happens with the lamps if I change the state of the switches on my phone or laptop. The switches in HA does not change either.

All of my devices that have lost some funtionality is Fibaro branded, and it is mostly Dimmer 2, FGD212 Firmware 3.5. I have a few other Fibaro devices working strange also...

bckmrk avatar Sep 01 '22 13:09 bckmrk

Your log shows that the dimmer accepted the command and sent an update after roughly one second.

# Turn on command sent:
2022-09-01T13:23:32.106Z DRIVER » [Node 012] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      50
                                  └─[SecurityCCCommandEncapsulation]
                                    │ nonce id: 238
                                    └─[MultiChannelCCCommandEncapsulation]
                                      │ source:      0
                                      │ destination: 1
                                      └─[MultilevelSwitchCCSet]
                                          target value: 255
                                          duration:     default

...

# Value changed to 99
2022-09-01T13:23:33.266Z CNTRLR   [Node 012] [~] [Multilevel Switch] currentValue: 0 => 99          [Endpoint 1]
2022-09-01T13:23:33.271Z CNTRLR   [Node 012] Scheduled poll canceled because expected value was received
2022-09-01T13:23:33.273Z DRIVER « [Node 012] [REQ] [ApplicationCommand]
                                  └─[SecurityCCCommandEncapsulation]
                                    │ sequenced: false
                                    └─[MultiChannelCCCommandEncapsulation]
                                      │ source:      1
                                      │ destination: 1
                                      └─[MultilevelSwitchCCReport]
                                          current value: 99

(some power usage updates are there aswell).

Are you sure this is the correct log?

AlCalzone avatar Sep 01 '22 14:09 AlCalzone

Qubino Flush 1D relay, node dead and not responding. Interviewing fail. zwave_js.log zwave_js-6e431c384e81d642f9ea95e419f54ab6-Flush 1D relay-fbf0b7ebeb27bbac5eec9a483e422e80.json.txt

mirkochip88 avatar Sep 01 '22 20:09 mirkochip88

Qubino Flush 1D relay, node dead and not responding

This claims it supports Multi Channel CC but does not respond to it at all. Interview gets aborted because this is considered a critical interview part. Because it's Qubino, try excluding without reset and then include again. Their devices sometimes behave strangely when you change a config parameter that changes endpoints, but then don't re-include like the manual says.

AlCalzone avatar Sep 02 '22 06:09 AlCalzone

Your log shows that the dimmer accepted the command and sent an update after roughly one second.

# Turn on command sent:
2022-09-01T13:23:32.106Z DRIVER » [Node 012] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      50
                                  └─[SecurityCCCommandEncapsulation]
                                    │ nonce id: 238
                                    └─[MultiChannelCCCommandEncapsulation]
                                      │ source:      0
                                      │ destination: 1
                                      └─[MultilevelSwitchCCSet]
                                          target value: 255
                                          duration:     default

...

# Value changed to 99
2022-09-01T13:23:33.266Z CNTRLR   [Node 012] [~] [Multilevel Switch] currentValue: 0 => 99          [Endpoint 1]
2022-09-01T13:23:33.271Z CNTRLR   [Node 012] Scheduled poll canceled because expected value was received
2022-09-01T13:23:33.273Z DRIVER « [Node 012] [REQ] [ApplicationCommand]
                                  └─[SecurityCCCommandEncapsulation]
                                    │ sequenced: false
                                    └─[MultiChannelCCCommandEncapsulation]
                                      │ source:      1
                                      │ destination: 1
                                      └─[MultilevelSwitchCCReport]
                                          current value: 99

(some power usage updates are there aswell).

Are you sure this is the correct log?

Yes I'm sure it's the correct log. However I'm not sure the lamp didn't respond as intended... I wasn't at home doing this log so couldn't check... will redo log's later today.

bckmrk avatar Sep 02 '22 10:09 bckmrk

Hmm, it's a little bit awkward to confess this, but re-interviewing every node that failed after the update helped with the problem. Now they seem to work perfectly again...

bckmrk avatar Sep 02 '22 16:09 bckmrk

Thanks for the feedback though.

AlCalzone avatar Sep 02 '22 16:09 AlCalzone

Qubino Flush 1D relay, node dead and not responding

This claims it supports Multi Channel CC but does not respond to it at all. Interview gets aborted because this is considered a critical interview part. Because it's Qubino, try excluding without reset and then include again. Their devices sometimes behave strangely when you change a config parameter that changes endpoints, but then don't re-include like the manual says.

It works, thanks!

mirkochip88 avatar Sep 02 '22 18:09 mirkochip88

Hi there, I'm experiencing an issue where my Zooz ZSE19 siren will no longer play sound clips ever since I updated to the latest version of zwavejs2mqtt. Sound clips for this device are played through a configuration parameter being set to an integer. The sounds won't play anymore and it looks like the config parameter is basically stuck; it won't change no matter what I do.

Here are my node debug info (is that a node dump?), as well as Silly-level logs of the interview and the command that doesn't work.

Versions:

  • zwavejs2mqtt 7.0.0
  • zwavejs 10.0.3

Things I've tried:

  1. Healing
  2. Reinterviewing
  3. Restarting my zwavejs2mqtt docker container
  4. Excluding / Reincluding
  5. Reinterviewing again
  6. Restarting again

Other things to note:

  • I can see in the node's properties that 112-0-3 (the sound effect config parameter) is supposedly writeable, so I don't know why this is happening.
  • The actual security siren functionality (i.e. the loud piercing wailing noise) and temperature/humidity reporting works just fine, i.e. I can turn the siren on and off and see values for the other stuff, so the node is responsive. It's just setting the config parameter that doesn't seem to want to work.
  • I haven't changed any of my settings or security keys, and my other S2-included devices (including other parts of this one) are still functional after the update.
  • zwavejs2mqtt reports that the api call is successful although it's clear from the zwavejs driver logs that it actually isn't, but that's a different problem altogether :P

Things I'm about to try:

  • Excluding / Reincluding without security (although ideally I wouldn't have to do this...) -- edit: nope, didn't work

Appreciate any help or advice. Thank you!

ekzor avatar Sep 02 '22 21:09 ekzor

I have mostly fibaro dimmer and sunricher dimmer

Going back to a 0.1.4 backup and unplugg "me zwave" stick removed from vm and add to vm rebooted host machine then all my zwave router was available and responded

skynet-network avatar Sep 03 '22 08:09 skynet-network

My happiness was short-lived... now almost all of my z-wave devices are not working. Some commands pass through, but it takes a long, long time...

zwave_js_ReInterview.log zwave_js_OnOff.log zwave_js-nodedump.json.txt

Tried to heal the network and reinterview nodes again without any change. Suggestions?

Rolled back to 9.6.2 backup and everything works again.

bckmrk avatar Sep 03 '22 17:09 bckmrk

I'm having issues with Zooz Open Close XS Sensor (ZSE41). I was doing the same test using latest HA addon 0.1.68 (zwave-js v10) and 0.1.65 (last zwave-js v9). In both tests I started with factory reset of the sensor and then tried to add it. The interview finished when using v9 but not in case of v10.

zwave_js-37b187458f723b1a394110dcea8ea050-Node 20-ab82391c377dceca05b7f0bc778af64a.json.txt log.txt

gralin avatar Sep 03 '22 21:09 gralin

I'm struggling with a Zooz Zen16 configured as a garage door opener. Both relays 1 and 2 are configured for garage door mode (parameters 16-112-0-2 and 16-112-0-3 set to a value of 3). Working just fine before upgrading to ZwaveJS2mqtt version 7.0.0. Now, when I send a command to open or close either garage door, I hear a quick click from the relays, but it's too fast for the opener to register it as a physical button press and the doors do not open / close. Looking at the relay states in the ZwaveJS2mqtt GUI and watching topics using MQTT Explorer, it looks as though there's some serious funkiness going on.

Manually changing the individual relay states from true to false or vice versa shows up in the targetValue and currentValue subtopics for the appropriate relays, however no action was actually taken by the device, aside from the quick click I hear. However, if I toggle the master device state for the whole Zen16 from false to true, both relays 1 and 2 fire simultaneously, for the correct momentary contact duration, and the garage doors are toggled. I'd prefer to have control over each door individually, as I did with the 6.X.X branch of ZwaveJS2mqtt and 9.X.X branch of Z-wave JS.

Current versions are ZwaveJS2mqtt 7.0.0, Z-wave JS 10.0.3. Here are the logs interview log.log driver level log.log

Edit: downgraded my install to ZwaveJS2Mqtt 6.16.0 and Z-wave JS to 9.6.2, and everything is back to normal. Will wait until the 10.X code branch has more bugs ironed out before trying it again.

epower53 avatar Sep 04 '22 03:09 epower53

@ekzor the ZSE19 doesn't seem to accept supervised commands correctly. The v10 version started using those extensively to increase reliability through S2. I've notified Zooz and will disable Supervision for that device as a short term fix.

@zwave-js-bot add compat flag to 0x027a:0x000c:0x0003:0.0

commandClasses: {
  remove: {
    // Supervised commands always result in a status of "Fail"
    "0x6c": {
      endpoints: "*"
    }
  }
}

AlCalzone avatar Sep 05 '22 11:09 AlCalzone

🔨 I created a PR at #5008 - check it out!

zwave-js-bot avatar Sep 05 '22 11:09 zwave-js-bot

@bckmrk Your stick is hanging and does not respond to commands at all. It might need to be re-plugged.

Error during node interview: Timeout while waiting for an ACK from the controller (ZW0200) 2022-09-03T17:21:48.947Z CNTRLR Failed to execute controller command after 2/3 attempts. Scheduling next try i n 1100 ms.

AlCalzone avatar Sep 05 '22 11:09 AlCalzone

@gralin that log is not on loglevel "debug". Please make another one.

AlCalzone avatar Sep 05 '22 11:09 AlCalzone

@AlCalzone I'm sorry, I have selected silly level in the Z-Wave JS addon configuration in HA but now when you pointed this out I noticed that after restarting, the log level of the addon reverted to info. Now when I selected debug instead of silly the setting survived the restart so I will perform the tests again and submit the proper logs. Is is enough when I copy the logs from the Log tab of the addon page? The start of the log seems to be truncated but I don't know where to get the complete ones.

gralin avatar Sep 05 '22 12:09 gralin

@gralin This works: https://www.home-assistant.io/integrations/zwave_js#how-to-access-the-z-wave-logs

AlCalzone avatar Sep 05 '22 12:09 AlCalzone

@bckmrk Your stick is hanging and does not respond to commands at all. It might need to be re-plugged.

Error during node interview: Timeout while waiting for an ACK from the controller (ZW0200) 2022-09-03T17:21:48.947Z CNTRLR Failed to execute controller command after 2/3 attempts. Scheduling next try i n 1100 ms.

Is that usually needed after an update? Never done that before. Will test it.

Everything worked well after I restored a backup, 0.1.65 (9.6.2). Did not re-plug anything.

bckmrk avatar Sep 05 '22 12:09 bckmrk

Is that usually needed after an update?

It isn't usually, but it happens sometimes. Maybe the stick was still blocked by something else.

AlCalzone avatar Sep 05 '22 13:09 AlCalzone

@epower53 I'm seeing lots of connectivity issues in your log. 48 duplicate incoming commands during the interview, RSSI values close to -90 dBm (which isn't much more than the background noise), several instances where routing suddenly uses 3 hops instead of 0, timed out responses.

This hints at a bad connection between the node and the controller, in both directions. I suggest you check out and follow this:

  • https://zwave-js.github.io/node-zwave-js/#/troubleshooting/connectivity-issues?id=general-troubleshooting
  • https://zwave-js.github.io/node-zwave-js/#/troubleshooting/network-health?id=testing-the-connection-strength (in zwavejs2mqtt on the network map after clicking a node)

The previous version probably works because all the information is already there, but the issue here is primarily in the real world.

AlCalzone avatar Sep 05 '22 15:09 AlCalzone

Sorry to disagree with you @AlCalzone, but I'm going to have to. This device has been installed and working for over 2 years on both Hubitat and then Z-Wave JS 9.X.X. It's unsurprising the direct connection statistics are poor - there's a great deal of distance and a significant amount of wood, plaster, and concrete between the device and my controller. Usually this device runs through at least one relay, sometimes two, after my mesh has settled in. I know you can only base an opinion on what you see in the logs, so I'm providing this extra info so you understand that this device NEVER misbehaved until I tried 10.0.3. My guess is that it's another S2 issue. It's paired S2 unauthenticated, and you mentioned that the ZSE19 was not playing nice with supervision packets. It's something in the code (either Zooz's or Z-wave JS's).

epower53 avatar Sep 05 '22 15:09 epower53

Can you share an interview log from 9.x then, so I can compare where the differences are?

Anyways, those are definitely not caused by Z-Wave JS:

2022-09-04T03:39:44.761Z DRIVER   Dropping message with invalid payload
2022-09-04T03:39:44.763Z DRIVER « [Node 016] [REQ] [BridgeApplicationCommand]
                                  │ RSSI: -75 dBm
                                  └─[Security2CCMessageEncapsulation] [INVALID]
                                      error: Duplicate command
2022-09-04T03:39:44.763Z SERIAL » [ACK]                                                                   (0x06)
2022-09-04T03:39:44.789Z SERIAL « 0x011900a8000110109f03a500d8111e1224b7dcba01ca6eee00a7aa            (27 bytes)
2022-09-04T03:39:44.791Z DRIVER   Dropping message with invalid payload
2022-09-04T03:39:44.792Z DRIVER « [Node 016] [REQ] [BridgeApplicationCommand]
                                  │ RSSI: -89 dBm
                                  └─[Security2CCMessageEncapsulation] [INVALID]
                                      error: Duplicate command
2022-09-04T03:39:44.793Z SERIAL » [ACK]                                                                   (0x06)
2022-09-04T03:39:44.853Z SERIAL « 0x011900a8000110109f03a500d8111e1224b7dcba01ca6eee00a8a5            (27 bytes)
2022-09-04T03:39:44.855Z DRIVER   Dropping message with invalid payload
2022-09-04T03:39:44.856Z DRIVER « [Node 016] [REQ] [BridgeApplicationCommand]
                                  │ RSSI: -88 dBm
                                  └─[Security2CCMessageEncapsulation] [INVALID]
                                      error: Duplicate command

and are 99.99% of the time a result of ACK frames that don't reach the device. And this is a problem when the device is busy repeating its recent commands instead of replying to the current one.

AlCalzone avatar Sep 05 '22 15:09 AlCalzone

Happy to... here's an interview just completed using 9.6.2

zwavejs_2022-09-05.log

epower53 avatar Sep 05 '22 16:09 epower53

The connectivity issues are also there on 9.x, so timeouts because of them are more like a game of chance. If they happen at the wrong time you'll see isses after re-interviewing on 9.x too.

However the issue you're experiencing is a different one. The Zen16 does simply not respond when asked which commands each endpoint supports securely. 9.x (incorrectly) didn't do anything with this info anyways and simply used the information from the root device. This means this worked out most of the time by accident.

10.x assumes that the endpoint's commands are not secure instead - which is probably not the best choice if the device is secure. I can work around that, but ultimately it's the device's fault.

AlCalzone avatar Sep 05 '22 16:09 AlCalzone

Zooz has been good offering support from an end customer's perspective - hopefully they'll be willing to dig in and fix some of the issues the 10.X code branch is exposing. Thanks for interpreting the logs for me... Z-wave JS has been great since I made the switch.

Edit: @labob246 is having the same experience with the Zen16. Corroboration is great... means I'm NOT crazy.

epower53 avatar Sep 05 '22 16:09 epower53

I am also having a Zooz Zen16 issue. interview.log Actions that dont work.txt

I do not know how to do a node dump, but the node in question is 15 and all nodes are directly connected to the stick. In the Actions that don't work.txt file, you will see that the device can be turned on and off but there is no individual control of relays 1, 2 or 3. In the file I turn on the device (relay 0), try turning on relay 3, try turning off relay 3 and then turn off the device. The device (relay 0) controls the on/off of 1 or 2 or 3 when they should be able to be turned on/off individually, which doesn't work. It stopped working with this latest version:
zwavejs2mqtt: 7.0.0 zwave-js: 10.0.3

labob246 avatar Sep 05 '22 19:09 labob246

@AlCalzone coming back to you with debug logs. Same test as before: starting with factory reset Zooz Open Close XS Sensor (ZSE41) with firmware 1.10 and trying to add it to HA (I was using it successfully with HA for some time now). After adding the device I close and open the door once.

Test 1 Z-Wave JS 0.1.64 Driver Version: 9.4.0 Server Version: 1.20.0 Result: ✔️ Device added to HA zwave_js_adding_device_suceeded.log zwave_js-success_node_diagnostics.json.txt

Test 2 Z-Wave JS 0.1.69 Driver Version: 10.0.3 Server Version: 1.22.1 Result: ❌ Failed to add device to HA zwave_js_adding_device_failed.log zwave_js-failed_node_diagnostics.json.txt

I have registered with Zooz and received latest firmware 1.30. Should I try to upgrade and do the test again?

gralin avatar Sep 05 '22 21:09 gralin

@gralin I found something buggy in your log that aborts the interview. However this is triggered by the device behaving strangely, which could be the result of a low battery:

2022-09-05T20:44:59.029Z DRIVER « [Node 021] [REQ] [BridgeApplicationCommand]
                                  │ RSSI: -91 dBm
                                  └─[Security2CCMessageEncapsulation]
                                    │ sequence number: 35
                                    └─[BatteryCCReport]
                                        level:  0
                                        is low: true

AlCalzone avatar Sep 06 '22 06:09 AlCalzone