org.openhab.binding.zwave icon indicating copy to clipboard operation
org.openhab.binding.zwave copied to clipboard

Heal never completes resulting in device not polled until openHAB is restarted

Open mhilbush opened this issue 6 years ago • 17 comments
trafficstars

Logging this as an issue so we can keep track of it.

For some battery devices, the heal process never completes. Symptoms are described in the forum post below. Because the heal process never completes, the initialization thread stays around until the system is restarted. The net result is that the device will not be polled until openHAB is restarted. One major downside of this is that the device battery level will not be updated.

Workaround is to disable the nightly heal.

Forum discussion is here. https://community.openhab.org/t/node-gets-into-weird-state-after-heal/74339/9

mhilbush avatar Jul 05 '19 14:07 mhilbush

I'm again trying to nail things down for the M2 release.

Coming across this one as of the final issues for M2: Is this really a new issue since M1? All the conversation I've read don't sound like it is. @openhab-5iver is this issue still critical for M2? Haven't seen any discussion or debugig neither in this issue, nor in the cited community thread. CCed @kaikreuzer @cdjackson

bjoernbrings avatar Jul 28 '19 20:07 bjoernbrings

I'm not sure when this started, but it's only effected me recently. @mhilbush would probably know. Issues setting up and maintaining an IDE have been slowing everything down, but Chris reportedly has a ~~working~~ functional IDE, so we'll hopefully see some things getting resolved for zwave. Based on my recent experience with #1178, disabling heal can have some serious side effects, so if this is not a recent regression, then it should be pretty high on the list.

5iver avatar Jul 28 '19 20:07 5iver

I suspect this has been there ever since the nightly heal was enabled in what at the time was the dev version of the binding. So, while not a regression, it sure would be nice it it could be fixed. I've disabled the nightly heal to work around it, but, as @openhab-5iver points out, there are consequences of doing that.

mhilbush avatar Jul 28 '19 21:07 mhilbush

@cdjackson Could you comment whether you see this as a critical regression that blocks a Milestone 2 build or do you think such a build can be done without this issue being addressed?

kaikreuzer avatar Aug 02 '19 09:08 kaikreuzer

Sorry for the slow response - I’ve been travelling the past couple of days.

I’ve not manage to test this myself as I don’t have a working ZWave IDE at the moment (I got ZigBee working and didn’t want to mess that up to fix ZWave yet).

I suspect this has been there ever since the nightly heal was enabled

I don’t see how this issue of heal not working can have been there since nightly heal was introduced which was longer ago than just the 2.4 development version (heal was introduced in 1.5 I think). If it is really stopping transactions such as polling after Heal, then this is major and needs to be fixed before M2 as it will kill everyones system. However I also see comments on the forum a few days ago where people say there is no such problem, so I’m not sure.

I get back to the UK at the end of the week and I think ZigBee is in a better shape and I can afford not to work on that if it breaks, so will try and get the IDE working for ZWave to look at this then.

On 2 Aug 2019, at 03:51, Kai Kreuzer [email protected] wrote:

@cdjackson https://github.com/cdjackson Could you comment whether you see this as a critical regression that blocks a Milestone 2 build or do you think such a build can be done without this issue being addressed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openhab/org.openhab.binding.zwave/issues/1195?email_source=notifications&email_token=AAH6IQYG6KLDKYDNS7KWMD3QCP7RJA5CNFSM4H6L7IM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3NI2DY#issuecomment-517639439, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH6IQ5MKGPAAGWQXBKAKDDQCP7RJANCNFSM4H6L7IMQ.

cdjackson avatar Aug 04 '19 11:08 cdjackson

A couple points of clarification...

I don’t see how this issue of heal not working can have been there since nightly heal was introduced which was longer ago than just the 2.4 development version (heal was introduced in 1.5 I think).

In my comment above, I was referring to when heal was enabled (reenabled) in the dev version of the binding. I've been tracking this problem since the beginning of this year. It was originally logged in this ticket. The title of this ticket, AssignReturnRoute fails repeatedly, is different, but I believe it's describing the same problem. https://www.cd-jackson.com/index.php/tickets/viewticket/ticketid-858

If it is really stopping transactions such as polling after Heal

The problem is that the heal never completes. And, because the heal never completes, the initialization thread never goes away, therefore the binding believes the device is still initializing. Because the binding thinks the device is initializing, the device will not be polled.

A couple other facts.

  • this issue occurs only on some battery devices. I've never seen it occur on mains devices
  • I believe we concluded that this is the same issue that @digitaldan reported where he saw many zwave threads that would never go away. I don't remember where that was logged. If I can find it, I'll add a reference link here.

mhilbush avatar Aug 04 '19 11:08 mhilbush

Here's the link to the discussion with @digitaldan

https://github.com/openhab/org.openhab.binding.zwave/pull/1174

mhilbush avatar Aug 04 '19 11:08 mhilbush

Ok, thanks - this is a different issue than I was thinking. I thought that I had seen people commenting that the heal kills ALL transactions after the heal starts. Maybe that was just the issue from 5iver which ended up being a local problem with his system.

I'm not completely sure what you refer to regarding the enabling/re-enabling of the heal. I don't recall it being disabled, but I guess that doesn't matter...

because the heal never completes, the initialization thread never goes away, therefore the binding believes the device is still initializing.

The thread is not really relevant here - just for clarification ;)

I will try to get the IDE working again when I get home. I don't have a ZWave stick with me at the moment so can't really do anything at the moment.

cdjackson avatar Aug 04 '19 12:08 cdjackson

heal kills ALL transactions after the heal starts

You're right. That was reported by @openhab-5iver who then determined it was a local problem.

I'm not completely sure what you refer to regarding the enabling/re-enabling of the heal.

As I recall, the nightly heal was disabled for a while. It was reenabled in the dev version. Admittedly, my memory could be a little faulty on this.

I will try to get the IDE working again when I get home.

Thanks. Safe travels.

mhilbush avatar Aug 04 '19 12:08 mhilbush

As I recall, the nightly heal was disabled for a while. It was reenabled in the dev version. Admittedly, my memory could be a little faulty on this.

Yes, and my memory may also have a retention issue! I may well have disabled it to see if it was really needed, and then added it back once people found not having it caused more problems. I have a vague memory of this in the archives somewhere ;)

cdjackson avatar Aug 04 '19 12:08 cdjackson

@kaikreuzer With a bette understanding of the issue, I would now say that this doesn't block 2.5M2.

cdjackson avatar Aug 04 '19 13:08 cdjackson

Ok, thanks for the feedback!

kaikreuzer avatar Aug 04 '19 19:08 kaikreuzer

You're right. That was reported by @openhab-5iver who then determined it was a local problem.

The routing in my network somehow got very messed up, but I'd disabled the daily heal, so it was not getting fixed. Manually healing my mains powered devices corrected the routing issues, but this did not resolve the issue of the daily heal causing the network to slow to a crawl, requiring an OH restart. When my routing was messed up, commands would just fail. After manually healing, the commands are just very very slow.

I just tested this again and nearly all of my nodes are offline and everything is extremely slow in responding to commands, so this is still occuring. I'd been assuming that this issue (#1195) was related, but it seems that what I am experiencing and what @digitaldan reported here are a separate issue that seemed to start after #1174. I've reopened #1178 to address this further.

5iver avatar Aug 05 '19 06:08 5iver

Hi, I just wanted to pop on here and bring up a lengthy thread we had on what appears to be this issue here We determined that during the nightly heal, a faulty node would result in several nodes to be marked as offline. Removing the faulty node, and optionally disabling the daily heal, prevented this from happening. Unfortunately I did not have DEBUG logging set before I removed the faulty node to show what was happening, but since removing the node appears to have resolved the issue, I am confident that was the cause.

rrgeorge avatar Oct 25 '19 16:10 rrgeorge

Further findings you can see here in openhab forum.

Celaeno1 avatar Oct 27 '19 07:10 Celaeno1

@cdjackson

I would like to inquire politely if any further work will be done on this issue. Can not say exactly if I have the same or just a similar problem. For more information you can also look here.

Thank you very much.

Celaeno1 avatar Nov 25 '19 13:11 Celaeno1

@cdjackson is it safe now to enable nightly heal? Do you plan to fix this issue?

rdslw avatar Oct 31 '20 13:10 rdslw