node-zwave-js icon indicating copy to clipboard operation
node-zwave-js copied to clipboard

Aeotec Z-Stick 700 with FW 7.17.2.0 - "Duplicate command" errors after backup restoration

Open mundschenk-at opened this issue 2 years ago • 72 comments

Is your problem within Home Assistant (Core or Z-Wave JS Integration)?

NO, my problem is NOT within Home Assistant or the ZWave JS integration

Is your problem within ZWaveJS2MQTT?

NO, my problem is NOT within ZWaveJS2MQTT

Checklist

Describe the bug

Yesterday, after updating the stick to FW 7.17.2.0, I restored a current NVM backup from my 500 series stick. The driver started up fine and the nodes seem to work OK. Unlike previous attempts, there were no dead nodes and no automatic re-interviews.

However, when checking the log files this morning, I noticed that it does contain a lot of "Duplicate command" errors, and not (just) from the Sensative Strips Guard 700. Could this be a side effect of a "Health Check" attempt I made (under the erroneous assumption that certain issues had been fixed since the last time I tried)? I did restart the driver afterwards though, to stop the network flooding.

Device information

Manufacturer: Aeotec Model name: Z-Stick 700 Node ID in your network: 1

How are you using node-zwave-js?

  • [ ] zwavejs2mqtt Docker image (latest)
  • [ ] zwavejs2mqtt Docker image (dev)
  • [ ] zwavejs2mqtt Docker manually built (please specify branches)
  • [ ] ioBroker.zwave2 adapter (please specify version)
  • [ ] HomeAssistant zwave_js integration (please specify version)
  • [ ] pkg
  • [ ] node-red-contrib-zwave-js (please specify version, double click node to find out)
  • [ ] Manually built from GitHub (please specify branch)
  • [X] Other (please describe)

Which branches or versions?

version: node-zwave-js branch: 8.11.6 zwavejs2mqtt branch: 6.5.2 zwavejs2mqtt community addon: 0.36.0

Did you change anything?

yes (please describe)

If yes, what did you change?

Switched back to Z-Stick 700

Did this work before?

Yes (please describe)

If yes, where did it work?

With Z-Stick 5gen+

Attach Driver Logfile

zwavejslog_20220312.zip

mundschenk-at avatar Mar 12 '22 15:03 mundschenk-at

👋 Hey @mundschenk-at!

It looks like you attached a logfile, but its filename doesn't look like it a driver log that came from Z-Wave JS. Please make sure you upload the correct one.

zwave-js-bot avatar Mar 12 '22 15:03 zwave-js-bot

It looks like you attached a logfile, but its filename doesn't look like it a driver log.

It is.

mundschenk-at avatar Mar 12 '22 15:03 mundschenk-at

For me, the logs look like the nodes don't get the expected acknowledgement from the controller and retry sending. Also there are some instances where the controller can't reach a node while the same node is sending reports to the controller, which would support my suspicion further.

I don't see any mention of you doing a network heal. Even if the backup restores the known routes, the physical differences between sticks may result in incorrect routes being used for attempting communication with the nodes.

AlCalzone avatar Mar 12 '22 21:03 AlCalzone

I had not done a network heal when that log file was created, no. I've since started one, takes some time for the battery powered nodes. I'll upload tomorrow's logfile which should be mostly after the healing (only the smoke detector is not done yet, not sure when it's scheduled for its daily wakeup).

mundschenk-at avatar Mar 12 '22 21:03 mundschenk-at

Addendum: According to the network graph, range still appears to be much worse with 700 series (though it's been advertised as having an improved range). I think I mentioned this previously in another ticket, I've a Strips Guard 700 on a window in the same room as the controller, maybe 5 m distance that takes 2 hops with the 700 series and only 1 hop with the 5gen+. This might explain the issues before the network heal (but not why range is so abysmal with the newer stick).

mundschenk-at avatar Mar 12 '22 21:03 mundschenk-at

Here's today's log, @AlCalzone. The network heal finished at 01:00 with the last remaining node (45 - the others were healed before the log turnover), but unfortunately to me it looks very similar (lots of invalid payloads/duplicate commands): zwavejslog_20220313.zip

mundschenk-at avatar Mar 13 '22 12:03 mundschenk-at

Not sure if we went through this already - did you follow the instructions for controller placement mentioned here? https://zwave-js.github.io/node-zwave-js/#/troubleshooting/connectivity-issues?id=general-troubleshooting

AlCalzone avatar Mar 13 '22 12:03 AlCalzone

We didn't, but I know about it and the stick's attached to a 1,5 m extension cord (above a wooden door frame, so no metal surface nearby either).

mundschenk-at avatar Mar 13 '22 13:03 mundschenk-at

Ok. What bugs me a bit are the low RSSI values (-88...-100 dBm), which are probably not much above the background noise, and they gradually get worse. Maybe this is less of a controller placement issue than a mesh strength issue.

You can see that at least node 18 has a stronger connection than most others:

2022-03-13T09:11:21.169Z DRIVER « [REQ] [SendDataBridge]
                                    callback id:            228
                                    transmit status:        OK, took 100 ms
                                    repeater node IDs:      18, 10
                                    routing attempts:       1
                                    protocol & route speed: Z-Wave, 100 kbit/s
                                    ACK RSSI:               -89 dBm
                                                            ^^ measured by the controller
                                    ACK RSSI on repeaters:  -63 dBm, N/A
                                                            ^^       ^^^ measured by node 10
                                                            ^^ measured by node 18
                                    ACK channel no.:        0
                                    TX channel no.:         0

In zwavejs2mqtt, on the network map, you have the ability to test the connection quality to individual nodes. I think this is your best bet now - identify nodes with a bad connectivity (might be the repeaters, not necessarily the "problematic" nodes) and try to optimize their connection.

You may also try to increase the TX power of the controller. At least partially your issue is caused by the nodes not being able to "hear" the controller. Note that this doesn't help if the controller can't "hear" some nodes though.

AlCalzone avatar Mar 13 '22 13:03 AlCalzone

How would increase the controller's TX power? The health check on the network map is not working for me (it rapidly swamps the network, similar to https://github.com/zwave-js/node-zwave-js/issues/4130).

This is a relatively small appartment and the non-battery nodes are fairely evenly spaced. Node 18 is in the same direction as Node 10, but the latter is nearer to controller (basically they are located on opposite sites of the same wall: 1 -- wall -- "empty" room -- 10 -- wall -- 18 -- other room).

mundschenk-at avatar Mar 13 '22 13:03 mundschenk-at

For comparison, this is the log from a few days ago using the Gen5+ controller (also Aeotec). Everything is the same except the controller (i.e. same nodes and placement): zwavejslog_20220309.zip

mundschenk-at avatar Mar 13 '22 13:03 mundschenk-at

How would increase the controller's TX power?

This is a bit tricky, since no UI exposes it AFAIK. You can do it via code though: grafik

I suggest the following:

  1. Go to the log tab, open log in a new window
  2. Go to the driver function dialog, execute this:
    await driver.controller.getPowerlevel()
    
  3. Look at the log and remember the two numbers. Should be something like -3.3 and 0 I think - mine are messed up (outside the -12.8...+12.7 dBm range).
  4. execute this:
    await driver.controller.setPowerlevel(
      3, // the normal powerlevel plus 3 (or so, need to test)
      0 // the output power at 0 dBm, unchanged!
    )
    

I would start with a slight adjustment of the normal powerlevel, e.g. +3 dBm. Note that increasing the powerlevel of a radio device may be against the law, so don't go overboard.

AlCalzone avatar Mar 14 '22 08:03 AlCalzone

The health check on the network map is not working for me (it rapidly swamps the network

Any chance I could see a log of that?

AlCalzone avatar Mar 14 '22 08:03 AlCalzone

I would start with a slight adjustment of the normal powerlevel, e.g. +3 dBm. Note that increasing the powerlevel of a radio device may be against the law, so don't go overboard.

What's normal here? GetPowerlevel returned these values:

                                    command:               GetPowerlevel
                                    normal powerlevel:     3.2 dBm
                                    output power at 0 dBm: 14.8 dBm

mundschenk-at avatar Mar 14 '22 17:03 mundschenk-at

The health check on the network map is not working for me (it rapidly swamps the network

Any chance I could see a log of that?

I've triggered the health check a few minutes ago: zwavejslog_20220314.zip

mundschenk-at avatar Mar 14 '22 18:03 mundschenk-at

I've triggered the health check a few minutes ago

Ok that's https://github.com/zwave-js/node-zwave-js/issues/4075

What's normal here? GetPowerlevel returned these values

I'm trying to find that out. Unfortunately my test stick has very strange values there. Will let you know once I've found the defaults

AlCalzone avatar Mar 14 '22 19:03 AlCalzone

What's normal here? GetPowerlevel returned these values

I'm trying to find that out. Unfortunately my test stick has very strange values there. Will let you know once I've found the defaults

Is the power level of a Gen5 stick comparable (or interesting)? Then I could switch back to my other stick.

Also, regarding network strength, do you see similar issues in the Gen5 log or is it the 700 series stick? (To me, things seem better with the Gen5 stick, but I'm not that well-versed in reading the logs.)

mundschenk-at avatar Mar 14 '22 20:03 mundschenk-at

From what I've seen in the NVM data, the settings for 500 and 700 series sticks aren't comparable.

In your Gen5 log, I don't see RSSI for incoming messages, but the ACKs have RSSI values in the 50s/60s, roughly 30-40 dBm higher than on the 700 series. This could be caused by incorrect settings on the 700 series stick.

Start by setting the TX power to:

normal powerlevel: 0
measured: +3.3

which are the default settings. For good measure also set the RF region to EU, you never know...

await driver.controller.setRFRegion(0)

AlCalzone avatar Mar 15 '22 07:03 AlCalzone

@mundschenk-at It might be that you're also affected by this bug: https://github.com/zwave-js/node-zwave-js/issues/4367 In that case you won't be able to change the settings on your own without editing the NVM backup first. I can do that for you if you send me the NVM backup via e-mail.

AlCalzone avatar Mar 15 '22 13:03 AlCalzone

@mundschenk-at It might be that you're also affected by this bug: #4367 In that case you won't be able to change the settings on your own without editing the NVM backup first. I can do that for you if you send me the NVM backup via e-mail.

Yes it definitely looks that way:

2022-03-15T15:12:04.787Z DRIVER « [RES] [SerialAPISetup]
                                    command: GetRFRegion
                                    region:  Unknown

I'll edit the backup myself, should not be a problem.

mundschenk-at avatar Mar 15 '22 17:03 mundschenk-at

Sure, see here for details: https://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-1068031298

AlCalzone avatar Mar 15 '22 17:03 AlCalzone

I tried this: Made a new 700 series backup, converted to JSON and replaced "rfConfig": null with the object from https://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-1068031298. It worked, in that GetRFRegion returns Europe and the power levels are now 0.0 dBm and 3.3 dBm.

However, the whole network started to be reinterviewed immediately. As a result, several nodes had incomplete information and/or security classes. I manually re-interviewed them (using the "reset security classes" option) and was able to fix most of them (two are missing, one is a smoke detector that I would need a ladder to wake up, the other an Aeotec Water Sensor 6, which is quite buggy when using S0 security - I may have to reinclude that, maybe dropping security this time). Unfortunately, even after this, things are not really good.

I've tried starting a network heal, but that has stalled after just 4 nodes (i.e. not even all mains powered nodes have been healed).

mundschenk-at avatar Mar 15 '22 20:03 mundschenk-at

Unfortunately, even after this, things are not really good. I've tried starting a network heal, but that has stalled after just 4 nodes .

Can you post another log so I can see for myself? You don't have to encrypt them, unless you re-include a secure device and/or change user codes while you're logging.

AlCalzone avatar Mar 15 '22 20:03 AlCalzone

Unfortunately, even after this, things are not really good. I've tried starting a network heal, but that has stalled after just 4 nodes .

Can you post another log so I can see for myself? You don't have to encrypt them, unless you re-include a secure device and/or change user codes while you're logging.

Driver restarts as well (home ID), no? Since these are pretty long logs (whole day), I feel more comfortable with encrypted uploads.

mundschenk-at avatar Mar 15 '22 20:03 mundschenk-at

The home ID is just a number that's transmitted in plaintext via radio anyways. If someone is standing near your network with a Zniffer, they will see it. Encrypted is fine for me though.

AlCalzone avatar Mar 15 '22 20:03 AlCalzone

Here it is: zwavejslog_20220315.zip

(Sorry, a bit messy due to repeated re-interview/heal attempts.)

mundschenk-at avatar Mar 15 '22 20:03 mundschenk-at

From what I'm seeing in the logs, the heal is slowly progressing. You had a full interview of node 99 going on in parallel, which delays the process quite a bit.

The RSSI values didn't change, but that's expected, since you didn't change anything physical. I guess now you can try this and see if the nodes can "hear" the controller better then.

AlCalzone avatar Mar 15 '22 20:03 AlCalzone

The RSSI values didn't change, but that's expected, since you didn't change anything physical. I guess now you can try this and see if the nodes can "hear" the controller better then.

Just to be clear, do you mean I should initially try 3, 0 as the parameters (as in the example), or rather 3, 3.3 (3 added to the previous "normal powerlevel" of 0.0, 0 added to the previous "output power at 0 dBm`)?

mundschenk-at avatar Mar 15 '22 20:03 mundschenk-at

I meant 3, 3.3.

AlCalzone avatar Mar 15 '22 21:03 AlCalzone

Here's the new log. I think it is somewhat better, but node 10 still appears to be problematic (though I don't know why, it's a FGWP102, though you might consider it a legacy device - I didn't know how to read the specifications properly when ordering my first nodes).

Also, when doing a health check for node 11, node 80 started freaking out completely. This one is a bit remote (at the bottom of a cupboard at the other end of the flat), but there are plenty of repeaters in between if necessary.

zwavejslog_20220315a.zip

mundschenk-at avatar Mar 15 '22 21:03 mundschenk-at