wink-mqtt icon indicating copy to clipboard operation
wink-mqtt copied to clipboard

Wink hub reboots while running wink-mqtt

Open jfath opened this issue 8 years ago • 13 comments

I have a rooted wink hub running the latest firmware (3.4.4). When I run ./mqtt start I see the 'Starting mqtt...' message and my mqtt broker log shows the connection and subscription requests. I can publish a message from another mqtt client and the wink hub will turn on/off a Z-Wave device as requested. Everything seems to be working as expected. The problem is, the wink hub will do a full reboot a few minutes after starting wink-mqtt. If I don't start wink-mqtt, the same hub will run indefinitely as expected. The reboot doesn't seem linked to sending or receiving messages - it will happen if I simply run ./mqtt start then leave the wink basically idle. I started the app from a ssh session and checked the log from the serial terminal during the run - no errors or anything suspicious. Any thoughts on how I might find some clues about what is causing the reboot?

jfath avatar Jan 03 '17 03:01 jfath

I think its a CPU/resource utilization problem - there's some lines in....... monitrc? about auto-rebooting if the CPU is above a certain amount for some amount of time (might be wrong about the time portion - it may just reboot).

I was never able to actually get this working on.. I want to say 3.3xsomething

You might be able to get around it by commenting out the lines in (I think) monitrc, though given the lack of heat sinks and cooling, it might be related to keeping the CPU from doing bad things if they didn't think to include any thermal protection (the poor mans temp control lol). Have to test that theory :-P

As an aside, I'm curious if any of this going to get updated for the newer firmware(s) or is this project going to get dropped in lieu of the new wink hub?

causalloop avatar Jan 03 '17 17:01 causalloop

This project has died out. I haven't kept up with the firmware updates.

On Tue, Jan 3, 2017 at 12:58 PM causalloop [email protected] wrote:

I think its a CPU/resource utilization problem - there's some lines in....... monitrc? about auto-rebooting if the CPU is above a certain amount for some amount of time (might be wrong about the time portion - it may just reboot).

I was never able to actually get this working on.. I want to say 3.3xsomething

You might be able to get around it by commenting out the lines in (I think) monitrc, though given the lack of heat sinks and cooling, it might be related to keeping the CPU from doing bad things if they didn't think to include any thermal protection (the poor mans temp control lol). Have to test that theory :-P

As an aside, I'm curious if any of this going to get updated for the newer firmware(s) or is this project going to get dropped in lieu of the new wink hub?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270178458, or mute the thread https://github.com/notifications/unsubscribe-auth/AEs5GAEnR9vE77Im2Mg0-nCaQNvz2EHOks5rOoxXgaJpZM4LZTxL .

danielolson13 avatar Jan 03 '17 18:01 danielolson13

😞 If you don't mind me asking, did you move on to a different platform etc or did you just lose motivation (totally understandable all things considered)

causalloop avatar Jan 03 '17 18:01 causalloop

Thanks danielolson13 for the work you've done. I don't think it's far from working with the latest firmware. It's completely functional on 3.4.4 until the reboot. I think causalloop is correct about the reboot coming from monitrc. At first glance, I see reboots dependant on the amount of memory used by node processes. It seems likely that some resource is being hit harder than the original devs expected. I'm going to try upping limits and/or commenting out reboots in monitrc to see if I can keep it running.

jfath avatar Jan 03 '17 18:01 jfath

Looks like the reboot is triggered by a high memory usage situation in 3.4.4. When I change the monitrc reboot actions to alerts, I see monit log errors about memory usage that would have definitely triggered a reboot. The interesting thing is, after I had left the wink running for about a day without running wink-mqtt, the errors triggered immediately and wink-mqtt would abort. After a reboot, wink-mqtt will start and run for several hours before the high memory usage alerts are triggered. It seems there is a memory leak in the 3.4.4 firmware or at least some process that gradually consumes more memory. I'll do some more digging to see if I can find the process that's eating memory.

jfath avatar Jan 04 '17 16:01 jfath

Maybe a cron job of some sort that restarts 'x' service intermittently would help prevent that - kill the leak before it gets noticed etc.  On Wed, Jan 4, 2017 08:33:56, jfath [email protected] wrote: Looks like the reboot is triggered by a high memory usage situation in 3.4.4. When I change the monitrc reboot actions to alerts, I see monit log errors about memory usage that would have definitely triggered a reboot. The interesting thing is, after I had left the wink running for about a day without running wink-mqtt, the errors triggered immediately and wink-mqtt would abort. After a reboot, wink-mqtt will start and run for several hours before the high memory usage alerts are triggered. It seems there is a memory leak in the 3.4.4 firmware or at least some process that gradually consumes more memory. I'll do some more digging to see if I can find the process that's eating memory. — You are receiving this because you commented. Reply to this email directly, view it on GitHub [https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270416665], or mute the thread [https://github.com/notifications/unsubscribe-auth/ABsk8O_J8ATw86gWe43okaMV5z26_7sfks5rO8nrgaJpZM4LZTxL].

causalloop avatar Jan 04 '17 16:01 causalloop

My Wink got so messed up after the last update that I gave up on it and upgraded to the Wink 2 for my primary automation.

There are memory problems with my application but I don't know if it's the version of Node installed on the wink or my poor programming, probably the programming. Hitting the database for the current state of devices is memory intensive and can cause issues. I never really got around the memory usage issues. I'd welcome any pull requests. Need to dig back into my old Wink and get it going again.

Wish that Wink would just open up their local control API. On Wed, Jan 4, 2017 at 11:38 AM causalloop [email protected] wrote:

Maybe a cron job of some sort that restarts 'x' service intermittently would help prevent that - kill the leak before it gets noticed etc. On Wed, Jan 4, 2017 08:33:56, jfath [email protected] wrote: Looks like the reboot is triggered by a high memory usage situation in 3.4.4. When I change the monitrc reboot actions to alerts, I see monit log errors about memory usage that would have definitely triggered a reboot. The interesting thing is, after I had left the wink running for about a day without running wink-mqtt, the errors triggered immediately and wink-mqtt would abort. After a reboot, wink-mqtt will start and run for several hours before the high memory usage alerts are triggered. It seems there is a memory leak in the 3.4.4 firmware or at least some process that gradually consumes more memory. I'll do some more digging to see if I can find the process that's eating memory. — You are receiving this because you commented. Reply to this email directly, view it on GitHub [ https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270416665], or mute the thread [ https://github.com/notifications/unsubscribe-auth/ABsk8O_J8ATw86gWe43okaMV5z26_7sfks5rO8nrgaJpZM4LZTxL].

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270418098, or mute the thread https://github.com/notifications/unsubscribe-auth/AEs5GMdWX52ogm_XD64Yyi0x01q43H8hks5rO8sWgaJpZM4LZTxL .

danielolson13 avatar Jan 04 '17 16:01 danielolson13

This may be a bit out in left field.. I wonder if it might be possible to set up either a pi or a vm or whatever and "clone" the wink hub - I say "clone" as a sort of placeholder for an over simplified first step - basically my thinking is if its a memory issue, and we obviously can't give it more.. why not move the hungry things to something that can handle it.  A file share mounted via ssh would I believe be possible, and I imagine some of the processes don't actually need the hardware in the hub.. maybe apron's commands can be piped somehow (ssh over an already unreliable wifi may not be the best solution, but we do have UART.. file shares via UART?  :-P )  

A bit out of the box, I know, but in principal, it seems like we just need to trick it into doing our bidding... and/or make it think its running on the hub..  I mean, if apron's looking for an "attached" device - for giggles, lets say 'ttyAM1' - couldn't that be mounted to a shared folder via a symlink or somesuch?  (I haven't even looked up any of this so I'm literally just sort of stream of thought-ing here).

Point being, whatever's bogging things down - assuming the leak can't be plugged so easily - it seems like thing should be offload-able, or even turned off.  For instance, I really only keep my wink around to control lutron stuff.. so potentially I could prevent Zigbee and Zwave stuff from starting their services etc.  More I imagine if I wanted to get nitpicky (do I really need a process for the RGB led?)

Anyway, food for thought..

On Wed, Jan 4, 2017 08:50:03, Dan Olson [email protected] wrote: My Wink got so messed up after the last update that I gave up on it and upgraded to the Wink 2 for my primary automation.

There are memory problems with my application but I don't know if it's the version of Node installed on the wink or my poor programming, probably the programming. Hitting the database for the current state of devices is memory intensive and can cause issues. I never really got around the memory usage issues. I'd welcome any pull requests. Need to dig back into my old Wink and get it going again.

Wish that Wink would just open up their local control API. On Wed, Jan 4, 2017 at 11:38 AM causalloop [email protected] wrote:

Maybe a cron job of some sort that restarts 'x' service intermittently would help prevent that - kill the leak before it gets noticed etc. On Wed, Jan 4, 2017 08:33:56, jfath [email protected] wrote: Looks like the reboot is triggered by a high memory usage situation in 3.4.4. When I change the monitrc reboot actions to alerts, I see monit log errors about memory usage that would have definitely triggered a reboot. The interesting thing is, after I had left the wink running for about a day without running wink-mqtt, the errors triggered immediately and wink-mqtt would abort. After a reboot, wink-mqtt will start and run for several hours before the high memory usage alerts are triggered. It seems there is a memory leak in the 3.4.4 firmware or at least some process that gradually consumes more memory. I'll do some more digging to see if I can find the process that's eating memory. — You are receiving this because you commented. Reply to this email directly, view it on GitHub [ https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270416665], or mute the thread [ https://github.com/notifications/unsubscribe-auth/ABsk8O_J8ATw86gWe43okaMV5z26_7sfks5rO8nrgaJpZM4LZTxL].

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270418098, or mute the thread https://github.com/notifications/unsubscribe-auth/AEs5GMdWX52ogm_XD64Yyi0x01q43H8hks5rO8sWgaJpZM4LZTxL .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub [https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270421266], or mute the thread [https://github.com/notifications/unsubscribe-auth/ABsk8CNqar0qOxgTKocXen8wIePlvbqSks5rO827gaJpZM4LZTxL].

causalloop avatar Jan 04 '17 17:01 causalloop

I wasn't completely clear in stating memory usage steadily increases when wink-mqtt isn't running at all. I think the wink firmware has a problem and the monit reboot is a brute force fix. That threshold is reached more quickly when wink-mqtt is running due to expected increased memory usage. If I can figure out the offending process and restart that process instead of doing a full reboot, we may have a simple solution.

jfath avatar Jan 04 '17 19:01 jfath

Interesting, how are you monitoring that? Are you just periodically checking top/ps or do you have some method (I'm no Linux guru) to log it? I ask because I don't think mines been doing that... but I also may have monkeyed with something and forgot.. (im terrible about keeping track of that stuff... too impatient lol)

On Wed, Jan 04, 2017 at 11:56am, jfath < [email protected] [[email protected]] > wrote: I wasn't completely clear in stating memory usage steadily increases when wink-mqtt isn't running at all. I think the wink firmware has a problem and the monit reboot is a brute force fix. That threshold is reached more quickly when wink-mqtt is running due to expected increased memory usage. If I can figure out the offending process and restart that process instead of doing a full reboot, we may have a simple solution.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub [https://github.com/danielolson13/wink-mqtt/issues/14#issuecomment-270470171] , or mute the thread [https://github.com/notifications/unsubscribe-auth/ABsk8H67jf_bZ6gQivXbs5oysRJojnvDks5rO_lwgaJpZM4LZTxL] .

causalloop avatar Jan 04 '17 20:01 causalloop

I first saw the steady memory increase when I changed the monitrc reboots to alerts. Eventually, monit started logging memory usage alerts: 75%, 75.1%, 75.2% .... I also checked top with wink-mqtt running and neither wink-mqtt.js nor server.js (the main node based process) were increasing in memory usage, so I think it's something else. I turned on logging in my terminal and ran top -b to capture about 10 minutes of top measurements. When I get a few minutes, I need to go through the log and see if I can pick out the offending process.

jfath avatar Jan 05 '17 01:01 jfath

@jfath Were you able to find anything out here? Tired of my hub rebooting at night and turning all of my lights on (so tired in fact that I am about to just buy a zigbee/zwave stick).

Syco54645 avatar Jun 02 '17 14:06 Syco54645

First off, thanks for this great project Dan. I know I am super late to this. Anyways, I faced similar instability issues (reboots etc.) on my wink and my hunch was a memory leak in nodejs was the culprit. I rewrote part of the application in golang and I am having good results stability wise. Hope this can help others too: https://github.com/sandman0/wink-mqtt-go

atomicsamurai avatar Apr 16 '21 22:04 atomicsamurai