ha-wyzesense icon indicating copy to clipboard operation
ha-wyzesense copied to clipboard

Sensors Stop Responding - Requires Reboot

Open cheechie opened this issue 4 years ago • 101 comments

I was running hassio on a pie and the wyze sensors would just stop responding after a day or sometimes 3 days. A reboot would fix the problem. Recently I have switched to running home assistant in a VM (Proxmox), but i still get the same issue. A reboot of home assistant fixes it every time.

Let me know if I can provide any logs to help.

cheechie avatar Mar 06 '20 12:03 cheechie

+1. Exact same scenario. Thought migrating to a Debian VM might help but it still requires that reboot frequently.

eregev avatar Mar 06 '20 17:03 eregev

Same for me. I've tried using previous versions of the component and keep needing to restart for the component to work again. It is AWESOME when it works but unfortunate when it doesn't. Here's what my last log on 0.0.7 says about the error:

Log Details (ERROR)
Logger: custom_components.wyzesense.wyzesense_custom 
First occured: 11:15:30 AM (1 occurences) 
Last logged: 11:15:30 AM

[Errno 110] Operation timed out

I'm on a HassOS 3.11 on a VirtualBox VM.

johnwarne avatar Mar 06 '20 18:03 johnwarne

Can you put your logs into Debug?

logger: default: info logs: custom_components.wyzesense: debug ~Will

On Fri, Mar 6, 2020 at 10:18 AM John Warne [email protected] wrote:

Same for me. I've tried using previous versions of the component and keep needing to restart for the component to work again. It is AWESOME when it works but unfortunate when it doesn't. Here's what my last log on 0.0.7 says about the error:

Log Details (ERROR) Logger: custom_components.wyzesense.wyzesense_custom First occured: 11:15:30 AM (1 occurences) Last logged: 11:15:30 AM

[Errno 110] Operation timed out

I'm on a HassOS 3.11 on a VirtualBox VM.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114?email_source=notifications&email_token=AC362MOG2AY4ZBLKIHG7R3TRGE5A3A5CNFSM4LC7JNG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCKJZQ#issuecomment-595895526, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC362MIM53ECNFHX6Q4TYYTRGE5A3ANCNFSM4LC7JNGQ .

photinus avatar Mar 06 '20 18:03 photinus

I am going to try this in my config until it gets sorted out. And have it reboot everyday.

restart HA to keep it fresh

  • alias: Restart HA trigger: platform: time at: "10:00:00"
    condition: condition: time weekday:
    • sun
    • mon
    • tue
    • wed
    • thu
    • fri
    • sat action:
    • service: homeassistant.restart

https://community.home-assistant.io/t/the-ability-to-schedule-a-reboot/30144

cheechie avatar Mar 06 '20 18:03 cheechie

@cheechie I've had mine rebooting for other reasons nightly. Seems like wyze gets through most of the day but by evening it's always in need of a reboot. Too bad hourly restarts aren't feasible for the time being...

eregev avatar Mar 06 '20 18:03 eregev

Can you put your logs into Debug? logger: default: info logs: custom_components.wyzesense: debug ~Will

You got it. I've got a log for a timeframe during which the component was working then failed. Where would you like me to send it?

johnwarne avatar Mar 06 '20 20:03 johnwarne

FYI, my system is also having this problem. I'm running HA on Ubuntu 18.04 in Docker container. It used to work fine but then I had a server power outage and ever since I've had issues. The only difference from the other comments is sometimes I have to stop the container, pull the controller for 10 sec, replace and then restart the container. I've turned on logging so if it happens again I'll upload the logs.

swingking avatar Mar 06 '20 20:03 swingking

@johnwarne can you post it to Pastebin and share a link here?

photinus avatar Mar 06 '20 20:03 photinus

btw, I'm pretty sure what is happening is the Home Assistant supervisor is restarting HA for some reason and we're not closing the connection the USB dongle. Then when it tried to re-initialize the connection it times out waiting on the initialization stage.

photinus avatar Mar 06 '20 21:03 photinus

Sure thing. I've removed everything except the component's logs: https://pastebin.com/Q8ETakPQ

johnwarne avatar Mar 06 '20 21:03 johnwarne

@johnwarne Can you try updating? You're on 0.0.4 from the debug logs, 0.0.7 fixes a lot of random issues.

photinus avatar Mar 06 '20 21:03 photinus

Apologies, I've been bouncing around between 0.0.5, 0.0.6, and 0.0.7. In HACS it says I'm currently on 0.0.6: https://www.dropbox.com/s/b3khmm9185t6mvk/Screenshot%202020-03-06%2014.16.32.png?dl=0. I'll update from 0.0.6 to 0.0.7 and will update this thread if/when I get another failure.

johnwarne avatar Mar 06 '20 21:03 johnwarne

Thanks @johnwarne , also please note the color of the LED on the dongle when it fails?

photinus avatar Mar 06 '20 21:03 photinus

@photinus The component has errored again. The light on the dongle is blue. Here's a link to the debug log:

https://pastebin.com/e4BA9CyP

johnwarne avatar Mar 06 '20 22:03 johnwarne

@johnwarne See this information on this error in virtualbox here: https://github.com/kevinvincent/ha-wyzesense/issues/20#issuecomment-595829838 Basically it involves switching your virtual machine to use a different (more stable) usb host controller This is unfortunately an issue where the OS drops the connection with the USB device.

kevinvincent avatar Mar 07 '20 02:03 kevinvincent

EDIT (about 2 hours later): Just applied a HACS update and the problem seems to be resolved.

= = = = = = = = = =

Apologies if the above already discusses the problem I'm having.

This morning I noticed that all of my Wyze sensors had stopped responding. I had added two sensors yesterday, but they were functioning fine when I went to bed. To troubleshoot, I restored to an earlier Snapshot which seemed to fix the problem... until I rebooted at which time all sensors went offline once again.

I'm Running Home Assistant 0.106.5 on a Mac mini, Ubuntu 18.04.4 LTS, Docker. Supervisor version is 209.

I installed the Wyze Sense Component through HACS.

Previously-working configuation.yaml entry:

binary_sensor:
  - platform: wyzesense
    device: "/dev/hidraw0"

Under Supervisor | System | Hardware | serial I see an entry for

/dev/serial/by-id/usb-0658_0200-if00

~$ dmesg | grep hidraw  
[    2.923925] hidraw: raw HID events driver (C) Jiri Kosina
[    2.987907] hid-generic 0003:1A86:E024.0001: hiddev0,hidraw0: USB HID v1.00 Device [HID 1a86:e024] on usb-0000:00:06.0-4/input0
[    3.940468] appleir 0003:05AC:8242.0002: input,hiddev1,hidraw1: USB HID v1.11 Device [Apple Computer, Inc. IR Receiver] on usb-0000:00:06.0-5/input0
[    5.224273] hid-generic 0003:05AC:820A.0003: input,hidraw2: USB HID v1.11 Keyboard [HID 05ac:820a] on usb-0000:00:06.0-6.1/input0
[    5.459073] hid-generic 0003:05AC:820B.0004: input,hidraw3: USB HID v1.11 Mouse [HID 05ac:820b] on usb-0000:00:06.0-6.2/input0

Wyze USB dongle is solid blue.

Word of warning: I'm not very familiar with Docker. It's running/working, but it confuses the heck out of me. I'm happy to try to issue Docker commands, but I'll need some firm hand-holding.

Logs GREPd for any mention of wyze are in the accompanying .zip file.

Any suggestions on how to get things back up and running?

wyze_log.txt.zip

grantalewis avatar Mar 08 '20 18:03 grantalewis

I'm also frequently experiencing this. Happens once or twice per week

RonSpawnson avatar Mar 10 '20 01:03 RonSpawnson

I'm also experiencing this about 1-2 times a week. I have 9 contact sensors and 3 motion sensors paired with it. Restarting HA doesn't fix it, removing/reinserting the dongle doesn't fix it. Only rebooting the whole host fixes it.

Running Hass.io installed on Ubuntu LTS.

ghost avatar Mar 10 '20 06:03 ghost

@johnwarne See this information on this error in virtualbox here: #20 (comment) Basically it involves switching your virtual machine to use a different (more stable) usb host controller This is unfortunately an issue where the OS drops the connection with the USB device.

Between switching to the USB 3.0 host controller and going back to 0.0.7 of the component, this seems to have solved my stability problems for the past several days (knocks on wood). I'd gone down to USB 2.0 since that helped with stability in the past, but perhaps with newer versions of the component that's not applicable. Thanks, @kevinvincent !

johnwarne avatar Mar 10 '20 15:03 johnwarne

So what is the solution to this? or a workaround perhaps? I reboot my host (not hass) and it fixes the issue. I also figured out reinstalling the component and rebooting hass (not host) fixes it as well.

Would really like to know how I can prevent my usb dongle from disconnecting every time supervisor updates, or worse at random like it seem to be doing about once every 2-3 days without manually rebooting the host. I run a vpn tunnel from my host as well (I know, I know, I need at least one more pi) but it's lightweight in the essence the ssh just needs to be kept open to keep the tunnel open. Nifty little failsafe.

Anyway, I am experiencing all the issues described in this thread. The dongle disconnecting, not reconnecting, and the sensors remaining in their off position. This is only resolved with the methods I mentioned. Is using usb 3.0 instead of 2.0 the trick? Curious if it is, as on my rpi4b it happens with both usb 3.0 and usb2.0 BUT I am using powered usb3.0 hub as instructed by several guides for rpi4b and usb's. Something about their power circuitry only having a set amount of voltage across the usb lanes. Google it if you're curious.

RoldyBuffalo avatar Mar 10 '20 17:03 RoldyBuffalo

So what is the solution to this? or a workaround perhaps? I reboot my host (not hass) and it fixes the issue. I also figured out reinstalling the component and rebooting hass (not host) fixes it as well.

Would really like to know how I can prevent my usb dongle from disconnecting every time supervisor updates, or worse at random like it seem to be doing about once every 2-3 days without manually rebooting the host. I run a vpn tunnel from my host as well (I know, I know, I need at least one more pi) but it's lightweight in the essence the ssh just needs to be kept open to keep the tunnel open. Nifty little failsafe.

Anyway, I am experiencing all the issues described in this thread. The dongle disconnecting, not reconnecting, and the sensors remaining in their off position. This is only resolved with the methods I mentioned. Is using usb 3.0 instead of 2.0 the trick? Curious if it is, as on my rpi4b it happens with both usb 3.0 and usb2.0 BUT I am using powered usb3.0 hub as instructed by several guides for rpi4b and usb's. Something about their power circuitry only having a set amount of voltage across the usb lanes. Google it if you're curious.

I have not seen a solution yet. I am running it in a VM in proxmox. I have to power of my HA VM and then uplug my USB dongle and then plug it back in and reboot. I might go back to hassio on the pie, because then I just had to reboot and the sensors would come back online.

I sure hope this issue can be resolved, otherwise I will have to look for other sensors to use instead of wyze sensors.

cheechie avatar Mar 10 '20 17:03 cheechie

I updated the bridge firmware from .30 to .33 fixed the problem for me.

tteck avatar Mar 10 '20 19:03 tteck

@tteck, upgraded by plugging back into a wyze cam? I switched to the 3.0 VirtualBox driver and haven't had issues yet (although it often takes between 1-5 days for the issue to show back up).

eregev avatar Mar 10 '20 19:03 eregev

@tteck, upgraded by plugging back into a wyze cam? I switched to the 3.0 VirtualBox driver and haven't had issues yet (although it often takes between 1-5 days for the issue to show back up).

Yes, I had to purchase a Wyze cam (also picked up a spare starter kit) to see what firmware was on my bridge. Firmware was .30 updated to .33 on the new bridge (kept the old bridge at .30 just in case) re-scanned my sensors (20) It's been 3 weeks with no problems.

tteck avatar Mar 10 '20 19:03 tteck

Interesting. I've never even plugged the Sense Bridge into a Wyze Cam. Will give it a shot and see if the firmware update fixes it!

ghost avatar Mar 10 '20 22:03 ghost

I updated the bridge firmware from .30 to .33 fixed the problem for me.

I just updated my bridge firmware from 0.30 to 0.33. My HA would not recognize any of my sensors with the new firmware. I had to add all 24 of my wyze sensors again buy scanning and pushing the button with the pin. Hopefully this works!

The firmware update is in camera settings ->Device Info ->Bridge Firmware Version

cheechie avatar Mar 11 '20 00:03 cheechie

@kevinvincent, my Wyze component stopped responding this afternoon and I have the debug logs. I've uploaded the relevant section to pastebin: https://pastebin.com/jiDDv4Cs

The wyze logs just seem to stop at a "Trying to parse:" message. FYI, the LED on the dongle is still blue. My system is a Ubuntu 18.04 system with HA running in a Docker container.

[Edit] I probably should mention that the system was running for almost exactly three days before it stopped handling events.

Thanks!

swingking avatar Mar 11 '20 01:03 swingking

ugh, if I must update my firmware, I suppose that's the fix. No bother other then re-pairing my sensors. I will report once the firmware is updated on progress.

Edit

Curiously, when I plugged the bridge into the camera, it either auto updated almost instantly, or it was already on 0.33 firmware for the bridge. I'm willing to bet the former. I will attempt to run the system for a few days without reboot, maybe the update (or not) fixed it?

Edit_2

Alright, it seems like it DID in fact update the bridge without me manually doing it once I plugged it into the camera and rebooted the camera, it applied the update within seconds. Secondly, the sensors all were unresponsive when I plugged it back into hass (which is why I think it updated,) I ran the wyzesense_scan call for every sensor (it did take about 20 minutes to go through them all) 16 sensors in total. The sensors are now responding normally, only time will tell if this solved the disconnect issues. Thanks for the tips!

RoldyBuffalo avatar Mar 11 '20 10:03 RoldyBuffalo

I do not have a camera as I purchased the sensors to use with HA. So far I have not had the disconnect Issues as others above. Is there a way to update firmware (if needed) without the camera?

bschatzow avatar Mar 11 '20 12:03 bschatzow

My bridge firmware is stuck on updating and never finishes. I'm thinking maybe it's because I have rstp firmware on camera. Anyone been able to successfully update firmware via camera with official rstp firmware?

I also second the question on whether there is any way to update bridge firmware without camera.

RonSpawnson avatar Mar 11 '20 13:03 RonSpawnson

My bridge firmware is stuck on updating and never finishes. I'm thinking maybe it's because I have rstp firmware on camera. Anyone been able to successfully update firmware via camera with official rstp firmware?

Try to open the wyze app, open the front page with all your cameras, and toggle off for 10 seconds, then on the camera from the power button on that page. once it boots all back up, you should see your bridge with the new firmware. I am running rtsp and mine updated successfully.

I also second the question on whether there is any way to update bridge firmware without camera.

I do not know of another way to do this, but either of the cameras work. They are not terrible cameras, I've used mine in a pinch when we lost our cat. I placed the camera outside in pan-mode with the litter box, opened it up in hass and watched for 3 hours till she came back. Without the camera idk what I would have done, it was freezing out. I know, specific case, but for 20 dollars and the ability to add rtsp firmware (even though it needs updated) and include it into hass, but really my cameras are place holders until we get a more stable, and robust system in place. I originally had the bridge hooked up to the camera, and it piped into alexa with some magic with smart things. This was gross, so I wanted a system to integrate it into alexa, without alot of b.s. DrZzzz's video got me started. 6 months later I'm damn near completely local control now. Couldn't have done more then half my automations without these freaking awesome little sensors. motion lighting, door chimes, security, mail call, they are the backbone of my system. I have faith that this update, hopefully fixed the issues I was having.

I will say as a final word RonSpawnson. My bridge didn't update immediately, I HAD to reboot the camera with the bridge installed for it to finish the update. you can check the current version in the app in the firmware page, check once it reboots it might update on it's own like mine did.

don't forget this as well

...Secondly, the sensors all were unresponsive when I plugged it back into hass (which is why I think it updated,) I ran the wyzesense_scan call for every sensor (it did take about 20 minutes to go through them all) 16 sensors in total. The sensors are now responding normally, only time will tell if this solved the disconnect issues. ...

RoldyBuffalo avatar Mar 11 '20 15:03 RoldyBuffalo

Thanks @VarenDerpsAround - I was able to finally upgrade my bridge firmware. To give some additional information in case anyone else runs into the issue I did - I tried your suggestions and tried multiple different sequences of upgrading, resetting, power cycling in app and physically, and none worked.

To resolve - I restored my camera to the latest non-rstp firmware (v4.9.5.98) and then the bridge firmware upgrade finally worked for me right away! For reference, the rstp firmware version I was using which was not allowing upgrades was v4.28.4.41.

I'll monitor my situation just like the others who recently upgraded the bridge firmware. If any of us see this issue crop up again, let's share that information in this issue. If a sufficient amount of time goes by without any of us experiencing the issue (1-2 weeks), I believe this confirms that this issue can be resolved via firmware upgrade, and perhaps the README can be updated with this information which can close this issue. Let's stay tuned for a couple weeks to confirm.

RonSpawnson avatar Mar 13 '20 18:03 RonSpawnson

I updated the firmware on my bridge 3 days ago, and it has been working flawlessly since then. I've had no issues, I hope it continues.

cheechie avatar Mar 14 '20 12:03 cheechie

I updated the firmware on my bridge 3 days ago, and it has been working flawlessly since then. I've had no issues, I hope it continues.

Well it all stopped working this morning. Upgrading the firmware for the bridge did not fix the issue.

cheechie avatar Mar 15 '20 16:03 cheechie

Strange, you and I are both running Proxmox and mine is still going strong. Nearly 4 weeks. I guess that I just got lucky, and that never seems to happen.

In proxmox for my USB pass through I used USB Vendor/Device ID and selected 1a86:e024 instead of USB port I also checked use USB3

Hate that it didn't fix your issue.

tteck avatar Mar 15 '20 16:03 tteck

I'm seeing this issue now. I picked up a refurbished Wyze cam. I'm attempting the upgrade. Current Bridge version is 0.0.0.30 upgrading to 0.0.0.33. I'll post later if this works as well.

PeterPlatt avatar Mar 15 '20 21:03 PeterPlatt

Sad to report that I also experienced the sensors not responding issue again this morning even though I updated my bridge to version 0.33 a few days back. Running on raspberry pi 4. Seems there is still another issue at play here sadly.

RonSpawnson avatar Mar 16 '20 14:03 RonSpawnson

Strange, you and I are both running Proxmox and mine is still going strong. Nearly 4 weeks. I guess that I just got lucky, and that never seems to happen.

In proxmox for my USB pass through I used USB Vendor/Device ID and selected 1a86:e024 instead of USB port I also checked use USB3

Hate that it didn't fix your issue.

That's exactly my settings.
usb

cheechie avatar Mar 16 '20 20:03 cheechie

I'm seeing this issue now. I picked up a refurbished Wyze cam. I'm attempting the upgrade. Current Bridge version is 0.0.0.30 upgrading to 0.0.0.33. I'll post later if this works as well.

Well, the upgrade helped and now I got a new Wyze cam toy. My setup is a Rasberry pi 4 and I have about 20+ sensors registered. I had to go around to each one and re-pair. No other changes needed. I also setup the reboot every night just for safe measure.

PeterPlatt avatar Mar 20 '20 00:03 PeterPlatt

Is there an understanding of why sensors lose connection to the bridge, or why the bridge loses connection to the OS? Maybe this is my lack of understanding of how this add on works, but I'd think that once it is working, if you don't update any of the firmwares, it would keep working?

Like others I have a camera that has been flashed to the rtsp firmware so it will be a bit of a chore to get it back on standard firmware, move the bridge onto it, update the firmware, then flash back to the rtsp.

mconlow avatar Mar 20 '20 19:03 mconlow

I hope everyone will give us a recap of the status of their Wyze Sense integration since updating their firmware. For me, it's been running for over a month with no problems. The only restarts are because of core updates.

tteck avatar Mar 27 '20 14:03 tteck

So far so good for me. ~19 days without a hiccup.

sebastian-silverwood avatar Mar 29 '20 22:03 sebastian-silverwood

I want to update with my experience. After firmware update it worked correctly for me for 4 days, after which I thought it broke. Since then it has been chugging along perfectly for another 13 days. I'm beginning to lose confidence that my issue after first 4 days was actually a repro of the sensor issue or something else entirely.

At this point I can say that the firmware update has either fixed it entirely for me, or at the very least has significantly lowered the frequency (used to occur at least once or twice a week for me). I'll update again if I experience a repro of the sensor loss issue. If I do not make any further comments, it has continued working flawlessly for me.

RonSpawnson avatar Mar 29 '20 22:03 RonSpawnson

I hope everyone will give us a recap of the status of their Wyze Sense integration since updating their firmware. For me, it's been running for over a month with no problems. The only restarts are because of core updates.

Ever since I updated my firmware, I've been running for 13 days without issue.

cheechie avatar Mar 29 '20 22:03 cheechie

I'm running 0.33 and was running fine for about 3 hours then stoped updating.

Phara0h avatar Apr 01 '20 02:04 Phara0h

Here is a the messages up to the point it stopped working.

custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5332, Payload=<None> 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5332) 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555332ff0283' 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=5333, Payload=b'0000017134bbce16' 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55530b330000017134bbce1603d5' 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa5333ff0284' 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa5333ff0284' 2020-04-01 03:54:15 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=53FF, Payload=ACK(5333) 2020-04-01 03:54:35 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.1830159120] Client exceeded max pending messages [2]: 512 2020-04-01 04:38:48 WARNING (Thread-5) [pychromecast.socket_client] [Living Room:8009] Heartbeat timeout, resetting connection 2020-04-01 04:39:13 ERROR (Thread-5) [pychromecast.socket_client] [Living Room:8009] Failed to connect to service Chromecast-Ultra-5e1a971623ba139ab40c72085277cbec._googlecast._tcp.local., retrying in 5.0s

Phara0h avatar Apr 02 '20 02:04 Phara0h

@kevinvincent debugged it further, called wyzesense.scan when it stopped working and got this error.

Logger: homeassistant.components.websocket_api.http.connection.1814187792 Source: custom_components/wyzesense/wyzesense_custom.py:106 Integration: websocket_api (documentation, issues) First occurred: 10:46:02 PM (2 occurrences) Last logged: 10:46:06 PM

[Errno 32] Broken pipe Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 134, in handle_call_service connection.context(msg), File "/usr/src/homeassistant/homeassistant/core.py", line 1234, in async_call await asyncio.shield(self._execute_service(handler, service_call)) File "/usr/src/homeassistant/homeassistant/core.py", line 1261, in _execute_service await self._hass.async_add_executor_job(handler.func, service_call) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/config/custom_components/wyzesense/binary_sensor.py", line 151, in on_scan result = ws.Scan() File "/config/custom_components/wyzesense/wyzesense_custom.py", line 530, in Scan self._DoSimpleCommand(Packet.EnableScan()) File "/config/custom_components/wyzesense/wyzesense_custom.py", line 393, in _DoSimpleCommand self._DoCommand(pkt, cmd_handler, timeout) File "/config/custom_components/wyzesense/wyzesense_custom.py", line 379, in _DoCommand self._SendPacket(pkt) File "/config/custom_components/wyzesense/wyzesense_custom.py", line 331, in _SendPacket pkt.Send(self.__fd) File "/config/custom_components/wyzesense/wyzesense_custom.py", line 106, in Send ss = os.write(fd, pkt) BrokenPipeError: [Errno 32] Broken pipe

Phara0h avatar Apr 02 '20 02:04 Phara0h

Update - I unfortunately indeed can confirm this happened for me again, even with 0.33 firmware. I have not restart it, so if there's anything I can do to help diagnose the issue while I'm in this state, please let me know.

RonSpawnson avatar Apr 03 '20 05:04 RonSpawnson

Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow.

RyanWoirol avatar Apr 03 '20 20:04 RyanWoirol

Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow.

I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would.

tteck avatar Apr 03 '20 20:04 tteck

Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow.

I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would.

hmm. Has this been happening to anyone then? or should I start a new thread?

RyanWoirol avatar Apr 03 '20 20:04 RyanWoirol

Mine would go from blue to yellow on the Pi but stays blue on a Debian PC install.

On Fri, Apr 3, 2020, 3:40 PM RyanWoirol [email protected] wrote:

Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow.

I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would.

hmm. Has this been happening to anyone then? or should I start a new thread?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-608647874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKFXSKTA2MTNUXOB5RNF3RKZCTTANCNFSM4LC7JNGQ .

eregev avatar Apr 03 '20 20:04 eregev

Mine would go from blue to yellow on the Pi but stays blue on a Debian PC install. On Fri, Apr 3, 2020, 3:40 PM RyanWoirol @.***> wrote: Mine survives anywhere from 30 minutes to a few hours before I need to restart but sometimes reinstall the service. I can always tell when it goes out because the LED goes from blue to yellow. I think you may have a different problem. The issue we're discussing, the led stays blue. At least mine would. hmm. Has this been happening to anyone then? or should I start a new thread? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#114 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKFXSKTA2MTNUXOB5RNF3RKZCTTANCNFSM4LC7JNGQ .

Yeah I am using a Pi 3B. Haven't had any problems with it until just this week. A reset of Hass.io will make the light turn blue and work for about 30 minutes, then it goes yellow and stops working until I reset again.

RyanWoirol avatar Apr 03 '20 21:04 RyanWoirol

And now I can't get a blue light and the sensors aren't working at all.

RyanWoirol avatar Apr 04 '20 03:04 RyanWoirol

I'm having the same problem as op on my RPi4 running HassOS 3.12 where Wyze motion sensors via wyzesense 0.0.7 stop responding after unpredictable lengths of time. Updated to latest bridge firmware (0.0.0.33). Ran for about 3 days before failing this time, other failures have occurred after 9 hours, 18 hours, and 2 days. Posted my log for past 24 hrs here: https://pastebin.com/7mKRUiZ7 I'm willing to help troubleshoot further if that will help. Love the custom component but can't rely on it at the moment. Anyone got any leads yet?

robgazy avatar Apr 04 '20 06:04 robgazy

Same as above. Very unreliable running the same setup as above.

robkiller26 avatar Apr 10 '20 16:04 robkiller26

You should be able to update your bridge firmware with the latest RTSP firmware for your camera, I never reverted my firmware to stock and it updated just fine, again, with the latest version of RTSP. I updated my bridge, that fixed some issues (ones where automations don't trigger, sensors stop responding or simply won't connect) then once the firmware was updated on the bridge, I found a faulty sensor that was the causes of the issues I was seeing beyond the previously mentioned. Update bridge, remove bad sensors, turn on debug logging. Just jumping on the "It's broken" train without trying any of the aforementioned solutions is kind of not my problem. I told you how I fixed it (now on the 4th day of no issues) and I can verify is fixed by seeing how HA is running after removing the faulty sensor.

Cheers.

RoldyBuffalo avatar Apr 10 '20 17:04 RoldyBuffalo

So i did not jump on the "its broken" train. I posted months ago on many forums trying to help problem solve the issue and it never went anywhere. I just reposted again to help confirm his issue he is not alone. I have updated my firmware on my bridge but still have the same issue. i have not seen a bad sensor but i am not sure how to tell if i have a bad sensor to be honest.

Thanks for reaching out i am willing to try anything to get it reliable again. How do i tell if i have a bad sensor?

Thanks

Rob

On Fri, Apr 10, 2020, 11:11 AM VarenDerpsAround [email protected] wrote:

You should be able to update your bridge firmware with the latest RTSP firmware for your camera, I never reverted my firmware to stock and it updated just fine, again, with the latest version of RTSP. I updated my bridge, that fixed some issues (ones where automations don't trigger, sensors stop responding or simply won't connect) then once the firmware was updated on the bridge, I found a faulty sensor that was the causes of the issues I was seeing beyond the previously mentioned. Update bridge, remove bad sensors, turn on debug logging. Just jumping on the "It's broken" train without trying any of the aforementioned solutions is kind of not my problem. I told you how I fixed it (now on the 4th day of no issues) and I can verify is fixed by seeing how HA is running after removing the faulty sensor.

Cheers.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-612126345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEPBFBYNRSSUNONTKEMGBWDRL5HNFANCNFSM4LC7JNGQ .

robkiller26 avatar Apr 15 '20 03:04 robkiller26

So what is the solution to this? or a workaround perhaps? I reboot my host (not hass) and it fixes the issue. I also figured out reinstalling the component and rebooting hass (not host) fixes it as well. Would really like to know how I can prevent my usb dongle from disconnecting every time supervisor updates, or worse at random like it seem to be doing about once every 2-3 days without manually rebooting the host. I run a vpn tunnel from my host as well (I know, I know, I need at least one more pi) but it's lightweight in the essence the ssh just needs to be kept open to keep the tunnel open. Nifty little failsafe. Anyway, I am experiencing all the issues described in this thread. The dongle disconnecting, not reconnecting, and the sensors remaining in their off position. This is only resolved with the methods I mentioned. Is using usb 3.0 instead of 2.0 the trick? Curious if it is, as on my rpi4b it happens with both usb 3.0 and usb2.0 BUT I am using powered usb3.0 hub as instructed by several guides for rpi4b and usb's. Something about their power circuitry only having a set amount of voltage across the usb lanes. Google it if you're curious.

I have not seen a solution yet. I am running it in a VM in proxmox. I have to power of my HA VM and then uplug my USB dongle and then plug it back in and reboot. I might go back to hassio on the pie, because then I just had to reboot and the sensors would come back online.

I sure hope this issue can be resolved, otherwise I will have to look for other sensors to use instead of wyze sensors.

Home assistant now loses connection to the sensors. And requires a reboot every 2 to 3 days now. Back to the same way it was. I didn't change or update anything.

cheechie avatar Apr 17 '20 01:04 cheechie

Firmware is updated same issue same errors below.

Mismatched checksum, remote=06E9, local=06F1

Invalid packet: b'55aa531d19000001718a2269b8a2373739423139383702146100010008083e06e9'

RROR (MainThread) [hacs] Validation for custom-components/sensor.wifi-scanner failed with GitHub returned 404 for https://api.github.com/repos/custom-components/sensor.wifi-scanner/contents/custom_components/blueprint/manifest.json.

robkiller26 avatar Apr 17 '20 22:04 robkiller26

Same issues here. Removing and re-adding 7/10 of mine seemed to fix it (3 were actually dead) for about 12 hours, then they got stuck again. Reboot of host did not fix. Have some batteries on the way. Going to change out all 10 and remove and re-add and see where that gets me

tripp396 avatar Apr 20 '20 14:04 tripp396

I think the Worker thread is getting killed.

When the sensors stop responding, ws._Dongle__thread.is_alive() was returning False. I don't know the reason the thread is dead. Don't see anything in the logs.

I am playing around with this at the moment https://github.com/daveenguyen/ha-wyzesense/tree/bugfix/dead-thread

I created a restart service and a binary sensor for the thread's alive state.

With those two, I created an automation to wyzesense.restart whenever the thread dies.

Hopefully this tips someone towards figuring this out

daveenguyen avatar Apr 23 '20 01:04 daveenguyen

Thanks daveenguyen, I think you are onto something here in regards to worker thread being killed. We just have to figure out what is killing it or where it is being killed. Maybe this is a stab in the dark, but I just experienced this again and looked at my debug level logs and saw the following:

2020-04-25 18:07:55 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171b3986dd10ea23737394231373700001ae40805'

Normally, this is followed be a message which says "Received: ...". But in this instance, it was the last log message I ever saw from wyzesense. Looking at code, the "watchdog" loop has an 'except' clause of OSError, after logging this there is then a 'break' statement, which would mean the While loop would be exited.

I'm going to try to remove the "break" in the OSError except, as I really think that this would kill the watchdog thread. I'll add additional logging in the except clause as well so I can try to detect if this actually saves me from it going haywire ever. I'm also going to try to add a generic "Exception" except clause and add logging there as well, to see if perhaps a silent, non OSError exception is being thrown and not logged, and killing worker thread due to not having an except clause.

Will update with findings next time wyzesense goes down, or, if I detect that this change saved me from wyzesense going down.

RonSpawnson avatar Apr 26 '20 00:04 RonSpawnson

@daveenguyen I have a code diff to share which attempts the fix I described above. Since this issue often takes days-to-weeks to repro for me, it will take some time for me to determine if it was successful or not. Since it may take a while for verification, I'm sharing the diff with others in case others want to attempt it or continue searching for other potential problems. Commit description contains additional information, as well as success and failure criteria. Keep in mind this is completely unverified, I have no idea if this fixes the issue or not. But since it will take me some time to know, I'm sharing my attempt now.

~~Here's the diff: https://github.com/RonSpawnson/ha-wyzesense/commit/d6e09744659814c8544ba86ae964b73fa675ab07~~ Update: This did not appear to resolve the issue. See my next comment for findings.

RonSpawnson avatar Apr 26 '20 02:04 RonSpawnson

Thanks @daveenguyen and @RonSpawnson for your insight and comments. I have been fighting this issue for several months now. I have 19 motion and 10 door sensors. I have tried many of the things already suggested such as updated firmware, replaced batteries, checked/removed bad sensors, scheduled reboots of HA and monitoring battery levels and RSSI levels. Some of these changes seemed to help for a period of time but the problem always remained. @RonSpawnson, I have added the changes which you are trying to my wyzesense_custom.py and rebooted. (I assume that is all that I need to do in order to implement). I will let you know any information which I can gather.

I believe the following reported issues are all related: #84 Wyze Sense sensors stopped working #117 Sensor stop responding, partially, or completely #121 Sensors completely stop responding after a random period of time #119 Sensors stop working after random periods of time #109 Scan sensor w/o MAC #98 Dead battery on contact sensor now having issues reconnecting

gterpstra75 avatar Apr 26 '20 21:04 gterpstra75

@RonSpawnson Thanks, I'm in. Got your modified wyzesense_custom.py installed and restarted HA. I've even got one suspicious sensor that is working at the moment. Let's see what happens.

robgazy avatar Apr 26 '20 21:04 robgazy

Update: My findings are the fix d6e0974 does not resolve the issue. I get many many lines letting me know it detected an OS error (the log file grows very rapidly). This means the worker thread is no longer being killed, but the OSError is indeed persistent. I see a few potential places we can go from here.

(1) Could we detect issue and recover automatically? I like the direction @daveenguyen has taken this - is there a way we can detect the OSError even higher in the stack, and automatically re initialize? I think we would need/want this somewhere in "binary_sensor.py". (2) Can we prevent this issue from recurring? Requires deep dive as to why an OSError is occurring and how we can prevent it from occurring.

(1) seems like an easier fix, but it's a mitigation rather than solving the root cause (2). However, we'd still be in a much better spot even with an automatic mitigation.

The hunt continues!

RonSpawnson avatar Apr 26 '20 21:04 RonSpawnson

Not long after I previously posted wyzesense stopped working (last few lines from log)

2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171b87a1ef4a23737394230343531021a6000010019c44a083755aa531d1900000171b87a1ef43737394230343531021a6000010019c4' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531d1900000171b87a1ef4a23737394230343531021a6000010019c44a0837' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5319, Payload=b'00000171b87a1ef4a23737394230343531021a6000010019c44a' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5319) 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555319ff026a' 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.binary_sensor] {'available': True, 'mac': '779B0451', 'state': 0, 'device_class': 'motion', 'timestamp': '2020-04-26T17:52:23.284000', 'rssi': -74, 'battery_level': 96} 2020-04-26 17:52:44 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171b87a1ef43737394230343531021a6000010019c44a0837' 2020-04-26 17:52:44 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Caught a non OSError exception.

gterpstra75 avatar Apr 26 '20 22:04 gterpstra75

@RonSpawnson I think you are right about non-OSError Errrors causing the thread to die.

Been running similar changes for a few days now. I noticed the thread haven't been killed so far, but am seeing this AssertionError https://github.com/HclX/WyzeSensePy/issues/9

Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError

That same AssertionError multiple times the past few days (edited for shorter length)

2020-04-23 13:51:42 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-23 14:18:30 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-23 17:44:45 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-24 16:52:07 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-24 17:27:39 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-24 20:47:59 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-25 18:51:44 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-25 20:59:42 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-26 00:46:24 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
2020-04-26 11:03:45 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread
Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError

Here are the changes I have

    def _Worker(self):
        while True: #Watchdog
            try:
                s = b""
                while True:
                    if self.__exit_event.isSet():
                        break
                    
                    s += self._ReadRawHID()
                    #if s:
                    #    log.info("Incoming buffer: %s", bytes_to_hex(s))
                    start = s.find(b"\x55\xAA")
                    if start == -1:
                        time.sleep(0.1)
                        continue

                    s = s[start:]
                    log.debug("Trying to parse: %s", bytes_to_hex(s))
                    pkt = Packet.Parse(s)
                    if not pkt:
                        s = s[2:]
                        continue

                    log.debug("Received: %s", bytes_to_hex(s[:pkt.Length]))
                    s = s[pkt.Length:]
                    self._HandlePacket(pkt)
+               if self.__exit_event.isSet():
+                   log.info("Exiting because exit event was set")
+                   break
-           except OSError as e:
-               log.error(e)
-               break
+           except:
+               log.exception("Something went wrong in Worker Thread")

My logs does not have any "Exiting because exit event was set", so it's not needed (it was added for my restart service I mentioned earlier)

daveenguyen avatar Apr 26 '20 23:04 daveenguyen

More info I'm seeing is that after my OSError(s), I noticed the hidraw device for my usb bridge is different! When it first started up my hidraw was hidraw0: Attempting to open connection to hub at /dev/hidraw0.

Following my OSError and using your 'restart' branch, I noticed that the hidraw is now hidraw3 by running ls -la /sys/class/hidraw and looking for the entry containing "IA86" and "E024". This explains why running the wyzesense restart service fails with message: No such device: '/dev/hidraw0'. In order to correct this, we'll need to find what the new dongle is, which can be done with findDongle().

My current thought is we don't want to restart the worker thread at all. Instead, I want to stop it, find the new dongle, and reinitialize everything, including a new worker thread. Here is my attempt thus far (still not verified working yet).

def on_restart(call):
        _LOGGER.debug("Restarting ws!")
        config[CONF_DEVICE] = findDongle()
        ws.Stop()
        setup_platform(hass, config, add_entites, None)

Also worth noting - I found a really easy and quick way to repro the issue. Just un-plug and re-plug the wyzesense bridge usb stick. Then we are in the OSError exception state with new hidraw. This allows rapid iteration/testing of fixes.

RonSpawnson avatar Apr 27 '20 01:04 RonSpawnson

I'm so close, but the issue I'm experiencing is that I've now got it attempting to use the correct hidraw during init, but I'm getting a file not found error for the new hidraw. As per issue #66, this appears to be an issue with HA supervised that it cannot see the new hidraw devices in the container.

2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] Restarting ws!
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] /dev/hidraw5
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] WYZESENSE v0.0.7
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] Attempting to open connection to hub at /dev/hidraw5
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.binary_sensor] Attempting beginConn with /dev/hidraw5
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.wyzesense_custom] Inside init
2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.wyzesense_custom] total 0
drwxr-xr-x    2 root     root             0 Apr 26 21:58 .
drwxr-xr-x   63 root     root             0 Apr 26 21:58 ..
lrwxrwxrwx    1 root     root             0 Apr 26 20:30 hidraw1 -> ../../devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.4/1-1.4:1.0/0003:04D9:2013.0002/hidraw/hidraw1
lrwxrwxrwx    1 root     root             0 Apr 26 20:30 hidraw2 -> ../../devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.4/1-1.4:1.1/0003:04D9:2013.0003/hidraw/hidraw2
lrwxrwxrwx    1 root     root             0 Apr 26 21:58 hidraw5 -> ../../devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1/1-1.1:1.0/0003:1A86:E024.0011/hidraw/hidraw5 [NOTE: THIS IS MY NEW HIDRAW FOLLOWING REPLUG]

2020-04-26 21:58:30 DEBUG (SyncWorker_14) [custom_components.wyzesense.wyzesense_custom] Device is: /dev/hidraw5
2020-04-26 21:58:30 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.2882217968] [Errno 2] No such file or directory: '/dev/hidraw5'
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 134, in handle_call_service
    connection.context(msg),
  File "/usr/src/homeassistant/homeassistant/core.py", line 1232, in async_call
    await asyncio.shield(self._execute_service(handler, service_call))
  File "/usr/src/homeassistant/homeassistant/core.py", line 1259, in _execute_service
    await self._hass.async_add_executor_job(handler.func, service_call)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/wyzesense/binary_sensor.py", line 197, in on_restart
    setup_platform(hass, config, add_entites, None)
  File "/config/custom_components/wyzesense/binary_sensor.py", line 119, in setup_platform
    ws = beginConn()
  File "<decorator-gen-4>", line 2, in beginConn
  File "/usr/local/lib/python3.7/site-packages/retry/api.py", line 74, in retry_decorator
    logger)
  File "/usr/local/lib/python3.7/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/config/custom_components/wyzesense/binary_sensor.py", line 117, in beginConn
    return Open(config[CONF_DEVICE], on_event)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 580, in Open
    return Dongle(device, event_handler)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 290, in __init__
    self.__fd = os.open(device, os.O_RDWR | os.O_NONBLOCK)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/hidraw5'

RonSpawnson avatar Apr 27 '20 03:04 RonSpawnson

So I implemented @daveenguyen lastest code. I like @robgazy have two suspected motion sensors that are enabled for this testing. The logs are indicating a couple of ERROR messages in Worker Thread:

2020-04-26 19:49:14 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread 2020-04-27 06:13:28 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Something went wrong in Worker Thread

But the WyzeSence continues to work.

gterpstra75 avatar Apr 27 '20 13:04 gterpstra75

Thanks gterpstra75. I want to point out that due to the code you are running, this may be a red herring. Allow me to explain.

Previous behavior: Kill worker thread on OSError. Silently ignore any other exceptions which are not OSError, worker thread continues.

Current behavior with code you are running: Do not kill worker thread on OSError. Print "Something went wrong in Worker Thread" message regardless of whether this was OSError or non OSError.

Because of this, it's possible that the fact that you are seeing "Something went wrong" message, but wyzesense continuing to work is a bit of a red herring. My proposal is to collect more detailed data. We should not be confusing problematic OSErrors with non-problematic non OSError exceptions. Additionally, instead of just printing "Something went wrong", we should be printing the actual exceptions themselves for more detail.

My proposal is to modify dave's code as follows to collect more detail:

In imports near top, add this line:

    import binascii
    import errno
    import subprocess
+   import traceback

And use the following two catch statements in _Worker function. Note that I removed the "break" from the OSError except clause, just like in Dave's code.

def _Worker(self):
        while True: #Watchdog
            try:
                s = b""
                while True:
                    if self.__exit_event.isSet():
                        break
                    
                    s += self._ReadRawHID()
                    #if s:
                    #    log.info("Incoming buffer: %s", bytes_to_hex(s))
                    start = s.find(b"\x55\xAA")
                    if start == -1:
                        time.sleep(0.1)
                        continue

                    s = s[start:]
                    log.debug("Trying to parse: %s", bytes_to_hex(s))
                    pkt = Packet.Parse(s)
                    if not pkt:
                        s = s[2:]
                        continue

                    log.debug("Received: %s", bytes_to_hex(s[:pkt.Length]))
                    s = s[pkt.Length:]
                    self._HandlePacket(pkt)
                     log.debug("Received: %s", bytes_to_hex(s[:pkt.Length]))
                     s = s[pkt.Length:]
                     self._HandlePacket(pkt)
+               if self.__exit_event.isSet():
+                   log.info("Exiting because exit event was set")
+                   break
            except OSError as e:
+                log.error("Caught an OSError in Worker thread")
                 log.error(e)
-                break
+           except:
+                 log.error("Caught a non OSError exception in Worker thread, but wyzesense may still continue working. If it stops working following this message, the below error is suspect. If it continues working, it was a red herring that can be ignored.")
+                 traceback.print_exc()

Finally, to summarize my investigation after running the above - for me it was OSErrors that I am seeing which were indeed killing the worker thread and causing wyzesense to stop. I also discovered it's super easy to reproduce them too - just un-plug and replug the usb stick. I've made additional code modifications to Dave's attempt at restart (I can share later), and am super close to it working, however the issue is that the new hidraw device is not exposed to the homeassistant docker container. Issue #66 contains more details. I'm currently working on seeing if we can modify supervisor to bind all hidraw/wyzesense devices, but it's not clear to me how to modify supervisor locally.

RonSpawnson avatar Apr 27 '20 13:04 RonSpawnson

@RonSpawnson, no problem. I have implemented your suggestions and willing to help in any way that I can. I am just glad that you and @daveenguyen are trying to fix this issue. I would love to help but I have no python programming skills.

gterpstra75 avatar Apr 27 '20 13:04 gterpstra75

Thanks! We appreciate the help! I'm confident with enough time and effort we will figure out a fix, or at least a mitigation. In the meantime, keep in mind that in the code changes you just did, if you see the "Caught a non OSError exception in Worker thread" message, you will still need to determine whether or not it was a red herring. If this message happened but wyzesense is still working, sure it was an exception, but it's ok to ignore as this was red herring which wasn't causing wyzesense outage. If, though, you see the non OSError exception and then wyzesense stops working, it is then important.

Put more simply: Run this until you see wyzesense stop working. Then you can determine what was the last exception you saw - OSError, or non OSError, and what were additional details about the exception. In my case it's an OSError, and it's reproducible by unplugging and replugging the bridge from my pi. Let us know what you find in your case.

RonSpawnson avatar Apr 27 '20 13:04 RonSpawnson

logging.exception does log some trace info. https://docs.python.org/3/library/logging.html#logging.exception

@gterpstra75 do you see more lines after the error message without date and time?

The only errors I've encountered in the past 4 days are 10 instances of these AssertionError

Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError

daveenguyen avatar Apr 27 '20 16:04 daveenguyen

@RonSpawnson I think it's not a red herring because the bare except is catching all Errors that would've been raised before. Uncaught errors are now "handled" in this loop.

To make the loop behave like the original code, we'd need to add the break back in the OSError handling block, and raise inside the bare except block.

I think the bare except (without raising) is keeping our thread alive.

            except OSError as e:
                log.error("Caught an OSError in Worker thread")
                log.error(e)
+               break
           except:
                log.error("Caught a non OSError exception in Worker thread, but wyzesense may still continue working. If it stops working following this message, the below error is suspect. If it continues working, it was a red herring that can be ignored.")
                traceback.print_exc()
+               raise

daveenguyen avatar Apr 27 '20 16:04 daveenguyen

@daveenguyen, I haven't seen any error message yet with the latest code that I have implemented. I will report back if/when I do.

gterpstra75 avatar Apr 27 '20 16:04 gterpstra75

I got my first non OSError message but Wyzesense has continued to run. Here is the messages around the error message:

2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171c0c1af660ea2373738303233434402000bfa0821' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171c0c1af660ea2373738303233434402000bfa0821' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171c0c1af660ea2373738303233434402000bfa' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335) 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286' 2020-04-28 08:27:48 INFO (Thread-10) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-28T08:27:31.046000, data=b'a2373738303233434402000bfa' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171c0c1af94a2373738303233434402115d0001000b6008f8' 2020-04-28 08:27:48 ERROR (Thread-10) [custom_components.wyzesense.wyzesense_custom] Caught a non OSError exception in Worker thread 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531d1900000171c0c1af94a2373738303233434402115d0001000bfa6008f8' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531d1900000171c0c1af94a2373738303233434402115d0001000bfa6008f8' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5319, Payload=b'00000171c0c1af94a2373738303233434402115d0001000bfa60' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5319) 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555319ff026a' 2020-04-28 08:27:48 DEBUG (Thread-10) [custom_components.wyzesense.binary_sensor] {'available': True, 'mac': '778023CD', 'state': 0, 'device_class': 'motion', 'timestamp': '2020-04-28T08:27:31.092000', 'rssi': -96, 'battery_level': 93}

I don't know how to get a traceback.

gterpstra75 avatar Apr 28 '20 13:04 gterpstra75

https://github.com/daveenguyen/ha-wyzesense/blob/bugfix/dead-thread/custom_components/wyzesense/wyzesense_custom.py#L381

Updated my branch to raise in the bare except. Thread dies. My sensor and restart automation kicks in. After the non-OSError/restart service, it is (Thread-15) and not (Thread-2)

The gap between the AssertionError and 2020-04-30 14:40:12 DEBUG (SyncWorker_5) [custom_components.wyzesense.binary_sensor] Updating Watchdog to False I am assuming is the next polling update. In the original ha-wyzesense, this is where we see our sensors stop responding (because the thread is dead).

2020-04-30 14:40:00 INFO (Thread-2) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:39:53.725000, data=b'ad373738394131423102011cc9'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd081f000ead37373839413142310266666606b955aa53193500000171cd081f000ead373738394131310266666606b9'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd081f000ead37373839413142310266666606b9'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd081f000ead373738394131423102666666'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:00 INFO (Thread-2) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:39:53.728000, data=b'ad373738394131423102666666'
2020-04-30 14:40:00 DEBUG (Thread-2) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd081f000ead373738394131310266666606b9'
2020-04-30 14:40:00 ERROR (Thread-2) [custom_components.wyzesense.wyzesense_custom] non-OSError in worker thread. Reraising
Traceback (most recent call last):
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 364, in _Worker
    pkt = Packet.Parse(s)
  File "/config/custom_components/wyzesense/wyzesense_custom.py", line 130, in Parse
    assert len(s) >= b2 + 4
AssertionError
2020-04-30 14:40:12 DEBUG (SyncWorker_5) [custom_components.wyzesense.binary_sensor] Updating Watchdog to False
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Restarting Dongle
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Start Inquiry...
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=4327, Payload=<None>
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55430327016c'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa43042801016f'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa43042801016f'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=4328, Payload=b'01'
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Inquiry returns 1
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=5314, Payload=b'ff'
2020-04-30 14:40:12 DEBUG (SyncWorker_9) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55530414ff0269'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171ca66fc87a2000b740882'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171ca66fc87a2000b740882'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171ca66fc87a2000b74'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd084b4a0ead373738374242424602010b7506a3'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd084b4a0ead373738374242424602010b7506a3'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084b4a0ead373738374242424602010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.066000, data=b'ad373738374242424602010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb18e767a2010b75080255aa531939ad37373837424242460200000171cb18e767a2010b75080255aa5314'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb18e767a2010b750802'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb18e767a2010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb18e767a2010b75080255aa5314'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb18e767a2010b750802'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb18e767a2010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa5314'
2020-04-30 14:40:12 ERROR (Thread-15) [custom_components.wyzesense.wyzesense_custom] Invalid packet: b'55aa5314'
2020-04-30 14:40:12 ERROR (Thread-15) [custom_components.wyzesense.wyzesense_custom] Invalid packet length: 4
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb18e767a2010b75080255aa530e3500000171cd084bae0314ff04eb'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb18e767a2010b750802'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb18e767a2010b75'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa530e3500000171cd084bae0314ff04eb55aa53193500000171cd084c070ead373738374242424602000b76066155aa530315016a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa530e3500000171cd084bae0314ff04eb'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084bae0314ff'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.166000, data=b'14ff'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd084c070ead373738374242424602000b76066155aa530315016a55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd084c070ead373738374242424602000b760661'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084c070ead373738374242424602000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.255000, data=b'ad373738374242424602000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa530315016a55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa530315016a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5315, Payload=<None>
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5315)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555315ff0266'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb1983a7a2000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa530332018755aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa5303320187'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5332, Payload=<None>
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5332)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555332ff0283'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=5333, Payload=b'00000171cd08685b'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa55530b3300000171cd08685b039a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa531939ad37373837424242460200000171cb1983a7a2000b7607df'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5339, Payload=b'ad37373837424242460200000171cb1983a7a2000b76'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5339)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555339ff028a'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Trying to parse: b'55aa53193500000171cd084cd00ead373738374242424602010b77072c'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Received: b'55aa53193500000171cd084cd00ead373738374242424602010b77072c'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] <=== Received: Packet: Cmd=5335, Payload=b'00000171cd084cd00ead373738374242424602010b77'
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] ===> Sending: Packet: Cmd=53FF, Payload=ACK(5335)
2020-04-30 14:40:12 DEBUG (Thread-15) [custom_components.wyzesense.wyzesense_custom] Sending: b'aa555335ff0286'
2020-04-30 14:40:12 INFO (Thread-15) [custom_components.wyzesense.wyzesense_custom] LOG: time=2020-04-30T14:40:05.456000, data=b'ad373738374242424602010b77'
2020-04-30 14:40:43 DEBUG (SyncWorker_3) [custom_components.wyzesense.binary_sensor] Updating Watchdog to True

daveenguyen avatar Apr 30 '20 23:04 daveenguyen

I think people can use #129 as a simple workaround for now and share non-OSError they encounter

daveenguyen avatar Apr 30 '20 23:04 daveenguyen

@gterpstra75 If you're using a change with log.exception, you should be able to see it in <home-assistant-url>/developer-tools/logs when you click on the wyzesense error to view more details.

image

daveenguyen avatar Apr 30 '20 23:04 daveenguyen

Thanks @daveenguyen. So far, I have only had one non-OSE error message when I was running code (from https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-620089443) which resulted in failure of wyzesence. I didn't get a traceback with the error message, just the message of the error. I am now running code similar to what you suggested above except that I have added the traceback. I do see the traceback in the log messages now. I also get several non-OSE error messages a day which doesn't result in failure of the wyzesence hub. I assume that previously (looking at the code, hey still trying to learn here) these would have resulted in failure due to the break statement. Thanks again, my wyzesense I believe has been much more stable this last week.

gterpstra75 avatar May 01 '20 23:05 gterpstra75

I dont known if what I say will help or not but have to mention that I used to put my laptop in a closet, 2 meters above the ground and maybe I have to restart every two weeks for example. But lately I had to change the place of the closet and I put the laptop on the floor aganist the wall. I may restart now 2 times a day or every 3 days ... but it is being annoying now. So I am not sure maybe this place misses some packets and is not easy to communicate with the nodes and this weak connection leads to some type of error. I hope this help.

AhmedAdelHosni avatar May 04 '20 08:05 AhmedAdelHosni

@daveenguyen I've been struggling with this same issue with my project Wyzesense2MQTT. I believe it is an unknown packet type, likely triggered when a sensor's battery is low, as I seem to observe in my setup. The fix I'm testing is in the Parse function:

        if cmd == cls.ASYNC_ACK:
            assert len(s) >= 7
            s = s[:7]
            payload = MAKE_CMD(cmd_type, b2)
        elif len(s) >= b2 + 4:
            s = s[: b2 + 4]
            payload = s[5:-2]
        else:
            log.error("Invalid packet: %s", bytes_to_hex(s))
            return None

I changed the else with an assert, into an elif instead, and added an else catchall to log the failures.

Not sure if this will truly resolve the worker crashes, but wanted to share since I found you are having the same issues, and my logged issue with WyzeSensePy was linked above in this thread.

If anyone wants to help determine what the err'ing packet actually is (low battery warning?), I'd love to see us actually be able to make use of it somehow with HA.

raetha avatar Jun 10 '20 00:06 raetha

I agree that it has to do with something being done with low batteries. I have about 30 sensors combined between motion and magnetic. It was at first i could go a week with no issues all bat show above 80 percent. Now i have stopped using HA because my wyze stops working min/sec after i reboot. All sensors still show good battery but nothing else has changed. It started to need a reboot closer and closer together until now it wont work more then a min or so. I have stopped putting time or using HA since all my automatons involve wyze sensors until this issue is solved. I am up for options on how to find low or bad sensors but updated firmware and everything has not lead to any bad sensors. I see the same errors as everyone is describing above.

robkiller26 avatar Jun 14 '20 20:06 robkiller26

@robkiller26 You could certainly try testing modifying the Parse function in your copy of HA-WyseSense as I specified above and see if that keeps things more stable. It would be interesting to note if some (but not all) of your sensors stop sending data, at which point that might be a trigger to know there is a low battery.

I love these sensors for cost, but their battery states are not very accurate. I've had ones that report 70%+, and then suddenly start having the bad MAC address issue. A battery replacement has almost always brought them back, but detecting when they actually need new batteries seems to only be reliable from the sensor itself when it starts flashing the led red periodically even without a sensor trigger. For the contact sensors, that is easy to see visually, but the motion sensors is harder as when you can see them, they can usually see you. :)

raetha avatar Jun 15 '20 14:06 raetha

I will look into that. I was wondering if the battery or issues showing up with the sensors would be easier to see with the wayze app. I was thinking about setting all my stuff up on the wayze camera and loading everything in the app to see if it would tell me what sensor is having issues.

Anyone try that?

On Mon, Jun 15, 2020, 8:35 AM Elias Hunt [email protected] wrote:

@robkiller26 https://github.com/robkiller26 You could certainly try testing modifying the Parse function in your copy of HA-WyseSense as I specified above and see if that keeps things more stable. It would be interesting to note if some (but not all) of your sensors stop sending data, at which point that might be a trigger to know there is a low battery.

I love these sensors for cost, but their battery states are not very accurate. I've had ones that report 70%+, and then suddenly start having the bad MAC address issue. A battery replacement has almost always brought them back, but detecting when they actually need new batteries seems to only be reliable from the sensor itself when it starts flashing the led red periodically even without a sensor trigger. For the contact sensors, that is easy to see visually, but the motion sensors is harder as when you can see them, they can usually see you. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kevinvincent/ha-wyzesense/issues/114#issuecomment-644173318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEPBFB62DEI5M6HU5QIVTD3RWYWRHANCNFSM4LC7JNGQ .

robkiller26 avatar Jun 15 '20 16:06 robkiller26

I'm having similar issues on gosense. Looks like the parser is unable to make sense of one of the three message types. (I've only seen three unique messages, not counting the ones during the Pairing process) First message type asserts that "state:1", e.g. motion is detected.

2020-06-19T19:33:30.561536251Z DEBU[0578] readRawHid: 62 bytes: [ 55aa53193500000172ce123b360ea2373741313742414202010053064655aa531d1900000172ce123b39a2373741313742414202176400010100531606b5 ]

Second message asserts that "state:0", e.g. no more motion. And it always happens 40 seconds after the motion is detected:

2020-06-19T19:34:10.558263651Z DEBU[0618] readRawHid: 62 bytes: [ 55aa53193500000172ce12d0230ea237374131374241420200005406c855aa531d1900000172ce12d026a237374131374241420217640001000054170738 ]

Third message almost always (pretty much always), comes in exactly 2 minutes after the first motion detected message.

2020-06-19T19:35:30.557162901Z DEBU[0698] readRawHid: 39 bytes: [ 55aa53231900000172ce13f9f7ab37374131374241420200000100000100000100000117000776 ]

Note that the last message is 39 bytes vs 62 bytes for the first two. It is misinterpreted by parser, but gosense library does not seem to care and chugs along nicely. All I get is a weird MQTT parsed message - battery says 0%, which is not true. Anyways, this issue is benign in gosense (not sure in wyzesense.py) but forces me to put some error checking in MQTT flows.

elyorkhakimov avatar Jun 19 '20 19:06 elyorkhakimov

this is what the misinterpreted 39-byte long message looks like in Node-Red, where I've subscribed to mqtt topic from gosense

image

elyorkhakimov avatar Jun 19 '20 19:06 elyorkhakimov

if cmd == cls.ASYNC_ACK: assert len(s) >= 7 s = s[:7] payload = MAKE_CMD(cmd_type, b2) elif len(s) >= b2 + 4: s = s[: b2 + 4] payload = s[5:-2] else: log.error("Invalid packet: %s", bytes_to_hex(s)) return None

@raetha has yours been more stable with this fix? I'm thinking of pulling it into the package.

kevinvincent avatar Jul 23 '20 00:07 kevinvincent

Hey Kevin - this fix definitely has, I believe this to be a good change. I'm still experiencing occasional outages that I think are related to issues with the udev and HA docker - those issues are 100% reproducible by unplugging and replugging your usb dongle in and out, and the HA container no longer has visibility into the hidraw device until HA restarts. And I think something with power management in my pi is causing this to occasionally blip out and require restart. For context: https://github.com/kevinvincent/ha-wyzesense/issues/66

That being said, anecdotally, since adding the diff you mention, I have noticed a perceivable decrease in frequency of restarts being required. So I personally feel this fix is a good one.

RonSpawnson avatar Jul 23 '20 02:07 RonSpawnson

My sensors have been running nonstop for over a month without any issue. Running VM in proxmox. It never requires a reboot anymore.

cheechie avatar Jul 23 '20 02:07 cheechie

@kevinvincent to confirm, it has helped significantly with the issues being seeing. We're still chasing down one lockup issue, but it seems to be related to the bridge and USB connection flaking out. I've just modified the retry settings on the connection function to ignore IOError to see if it covers that. But with just the assert change we aren't seeing the errant packet killing things anymore. So I think you are pretty safe. Still love to see someone with some hardware chops help us figure out what that packet really means though. If as I think, it's a low battery warning, that would be great to make use of, instead of getting stuck with the invalid MAC issue that shows up when a sensor runs too low.

raetha avatar Jul 25 '20 01:07 raetha

@raetha Great to hear. Yeah, unfortunately, I don't have too much experience with the reverse engineering side to figure out what exactly that packet means. It would be cool if it was a low battery warning we could catch. I've just had to check for the 3 fast blinks on my sensors to know when to replace them.

I've gone ahead and released v0.0.9 with that fix. Thank you :)

kevinvincent avatar Jul 25 '20 20:07 kevinvincent

You are very welcome, happy to share. Feel free to pour through the rest of my project's code sometime and see if there might be anything else we've caught that could be useful to you. I don't remember all the things off the top of my head, just that I spent about a month right off the bat working through a variety of things that behaved weirdly.

Now that the bad packets are at least getting something logged on my side, I'm going to watch for those log messages periodically and see if they seem to occur the next time I have a low battery. I don't know how to convert the packet to useful data, but it would be nice to confirm my suspicion that they are related. Then if we can at least parse the MAC from it somehow, we'd have a low battery warning. :)

raetha avatar Jul 26 '20 14:07 raetha

Running 0.9 on rpi and finding I need to reboot every now and then. Help :)

daleye avatar Oct 10 '20 19:10 daleye

@kevinvincent one of my contributors recently figured out how to modify the WyzeSensePy library and capture the ~4hr updates the sensors send with their current state. I haven't been able to test yet, but merged the code into my devel branch. Assuming it works, there isn't really a "low" batter alert, but rather more regular check-ins than just waiting until a sensor is tripped.

A couple threads also mentioned that the battery level may be off by a factor of 5. Not sure we can count on that, but might look into applying that logic just to show a more realistic battery level.

raetha avatar Nov 28 '20 05:11 raetha

@raetha if you submit a pull request I can merge it in :-)

photinus avatar Nov 28 '20 16:11 photinus

This repo is marked as "No Longer Maintained". Is there any fork that still actively maintained?

atjshop avatar Nov 28 '20 18:11 atjshop