zigpy-zigate icon indicating copy to clipboard operation
zigpy-zigate copied to clipboard

Zigpy Zigate Wifi - Close connection to ZHA

Open baimard opened this issue 4 years ago • 71 comments

Dear,

Home Assistant : 2021.5.4 ZHA : Last version in HASS ZiGate-WiFi v2.00 with esp-link v3.2.50-gfe4f565-dirty

I use Zigate wifi with ZHA in Home Assistant, after random time (30 min to 4hours), Home Assistant show that :

Logger: zigpy_zigate.api
Source: /usr/local/lib/python3.8/site-packages/zigpy_zigate/api.py:138
First occurred: 19:46:24 (20 occurrences)
Last logged: 19:47:26

No response to command 0x0530
----
Logger: asyncio
Source: /usr/local/lib/python3.8/asyncio/selector_events.py:903
First occurred: 19:46:54 (16 occurrences)
Last logged: 19:47:25

socket.send() raised exception.

After this message, all my sensor light etc. not work with the device.

Home assistant don't send any request to my zigate wifi.

Tcpdump : 192.168.4.252 si my HASS 192.168.4.227 Zigate Wifi

19:22:09.071920 ARP, Request who-has 192.168.4.227 tell 192.168.4.252, length 46
19:22:09.073167 ARP, Reply 192.168.4.227 is-at 3c:61:05:d1:1f:d6, length 46
19:25:59.740061 IP 192.168.4.227.9999 > 192.168.4.252.54258: Flags [P.], seq 2250:2330, ack 860, win 1824, length 80
19:25:59.740679 IP 192.168.4.252.54258 > 192.168.4.227.9999: Flags [.], ack 2330, win 64068, length 0
19:25:59.741956 IP 192.168.4.227.9999 > 192.168.4.252.54258: Flags [P.], seq 2330:2340, ack 860, win 1824, length 10
19:25:59.743470 IP 192.168.4.252.54258 > 192.168.4.227.9999: Flags [.], ack 2340, win 64068, length 0
19:26:04.848226 ARP, Request who-has 192.168.4.227 tell 192.168.4.252, length 46
19:26:04.850873 ARP, Reply 192.168.4.227 is-at 3c:61:05:d1:1f:d6, length 46
19:30:59.877528 IP 192.168.4.227.9999 > 192.168.4.252.54258: Flags [F.], seq 2340, ack 860, win 1824, length 0
19:30:59.879422 IP 192.168.4.252.54258 > 192.168.4.227.9999: Flags [F.], seq 860, ack 2341, win 64068, length 0
19:30:59.880991 IP 192.168.4.227.9999 > 192.168.4.252.54258: Flags [.], ack 861, win 1823, length 0
19:31:05.139535 ARP, Request who-has 192.168.4.227 tell 192.168.4.252, length 46
19:31:05.141093 ARP, Reply 192.168.4.227 is-at 3c:61:05:d1:1f:d6, length 46
19:37:21.514434 ARP, Request who-has 192.168.4.227 tell 192.168.4.1, length 46

So two questions :

  • Why zigate Wifi close the connection ?
  • Is it possible to make ZHA restart the connection ?

Thank you

baimard avatar May 16 '21 17:05 baimard

Hi,

your bug seems to be similar to issue #55 Did you upgrade Zigate's firmware ? A new release (3.1e) has just been delivered

totof129 avatar May 17 '21 11:05 totof129

Do you know how I could watch my firmware version of zigate ?

baimard avatar May 17 '21 12:05 baimard

you should be able to see it in the visualization tab in ZHA configuation

totof129 avatar May 17 '21 12:05 totof129

3.1d for me, new question ... where is the last firmware ...

baimard avatar May 17 '21 12:05 baimard

https://github.com/fairecasoimeme/ZiGate/releases

totof129 avatar May 17 '21 12:05 totof129

Merci beaucoup !

Je vais essayer voir si ça corrige.

baimard avatar May 17 '21 12:05 baimard

another bug :

home_assistant.1.mag30mglug26@x-kaki    | 2021-05-18 22:55:43 ERROR (MainThread) [zigpy_zigate.api] Received unhandled response 0x8012
home_assistant.1.mag30mglug26@x-kaki    | 2021-05-18 22:55:43 ERROR (MainThread) [zigpy_zigate.api] Received unhandled response 0x8012
home_assistant.1.mag30mglug26@x-kaki    | 2021-05-18 22:55:44 ERROR (MainThread) [zigpy_zigate.api] Received unhandled response 0x8012
home_assistant.1.mag30mglug26@x-kaki    | 2021-05-18 22:55:44 ERROR (MainThread) [zigpy_zigate.api] Received unhandled response 0x8012
home_assistant.1.mag30mglug26@x-kaki    | 2021-05-18 22:55:44 ERROR (MainThread) [zigpy_zigate.api] Received unhandled response 0x8012
home_assistant.1.mag30mglug26@x-kaki    | 2021-05-18 22:55:44 ERROR (MainThread) [zigpy_zigate.api] Received unhandled response 0x8012

Always same deconnection ...

home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:57 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event call_service[L]: domain=light, service=turn_off, service_data=entity_id=light.beta_level_on_off>
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:57 DEBUG (MainThread) [zigpy_zigate.zigbee.application] request (0x5231, 260, 6, 1, 1, 18, b'\x01\x12\x00', True, False)
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:57 DEBUG (MainThread) [zigpy_zigate.uart] Send: 0x0530 b'025231010100060104000003011200'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:57 DEBUG (MainThread) [zigpy_zigate.uart] Frame to send: b'\x050\x00\x0fH\x02R1\x01\x01\x00\x06\x01\x04\x00\x00\x03\x01\x12\x00'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:57 DEBUG (MainThread) [zigpy_zigate.uart] Frame escaped: b'\x02\x150\x02\x10\x02\x1fH\x02\x12R1\x02\x11\x02\x11\x02\x10\x02\x16\x02\x11\x02\x14\x02\x10\x02\x10\x02\x13\x02\x11\x12\x02\x10'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:58 WARNING (MainThread) [zigpy_zigate.api] No response to command 0x0530
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:58 WARNING (MainThread) [zigpy_zigate.api] Retry command 0x0530
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:58 DEBUG (MainThread) [zigpy_zigate.uart] Send: 0x0530 b'025231010100060104000003011200'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:58 DEBUG (MainThread) [zigpy_zigate.uart] Frame to send: b'\x050\x00\x0fH\x02R1\x01\x01\x00\x06\x01\x04\x00\x00\x03\x01\x12\x00'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:25:58 DEBUG (MainThread) [zigpy_zigate.uart] Frame escaped: b'\x02\x150\x02\x10\x02\x1fH\x02\x12R1\x02\x11\x02\x11\x02\x10\x02\x16\x02\x11\x02\x14\x02\x10\x02\x10\x02\x13\x02\x11\x12\x02\x10'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:26:00 WARNING (MainThread) [zigpy_zigate.api] No response to command 0x0530
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:26:00 DEBUG (MainThread) [zigpy.device] [0x5231] Delivery error for seq # 0x12, on endpoint id 1 cluster 0x0006: ZiGate doesn't answer to command
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:26:00 DEBUG (MainThread) [homeassistant.components.zha.core.channels.base] [0x5231:1:0x0006]: command failed: 'off' args: '()' kwargs '{}' exception: '[0x5231:1:0x0006]: Message send failure'
home_assistant.1.k28tajqju8bv@x-kaki    | 2021-05-18 23:26:00 DEBUG (MainThread) [homeassistant.components.zha.entity] light.beta_level_on_off: turned off: [0x5231:1:0x0006]: Message send failure

There is a really big problem, I can't identify what happend. I can do every debug you want, to try to solve this issue !

Could you please take this problem in consideration ?

https://github.com/home-assistant/core/issues/43962

baimard avatar May 18 '21 20:05 baimard

Same here, I can also help with debug if you need information

letroll avatar Jun 14 '21 09:06 letroll

I use firmware 3.1e Yesterday all is working. This morning: devices Offlines

tygerlord avatar Jun 14 '21 09:06 tygerlord

Hi, I have the same problem. There is a query that is disconnected and no longer makes the plugin work. A little try catch would at least prevent the plugin from crashing

Alindil avatar Jun 14 '21 10:06 Alindil

2021-06-13 21:04:43 WARNING (MainThread) [zigpy_zigate.api] No response to command 0x0530                                                               ┤
│2021-06-13 21:04:43 WARNING (MainThread) [zigpy_zigate.api] Retry command 0x0530                                                                        ┤
│2021-06-13 21:04:43 DEBUG (MainThread) [zigpy_zigate.uart] Send: 0x0530 b'02a6e10101000001040000050057000400'                                           ┤
│2021-06-13 21:04:43 DEBUG (MainThread) [zigpy_zigate.uart] Frame to send: b'\x050\x00\x112\x02\xa6\xe1\x01\x01\x00\x00\x01\x04\x00\x00\x05\x00W\x00\x04\┤
│2021-06-13 21:04:43 DEBUG (MainThread) [zigpy_zigate.uart] Frame escaped: b'\x02\x150\x02\x10\x112\x02\x12\xa6\xe1\x02\x11\x02\x11\x02\x10\x02\x10\x02\x┤
│2021-06-13 21:04:44 WARNING (MainThread) [zigpy_zigate.api] No response to command 0x0530                                                               ┤
│2021-06-13 21:04:44 DEBUG (MainThread) [zigpy.device] [0xa6e1] Delivery error for seq # 0x57, on endpoint id 1 cluster 0x0000: ZiGate doesn't answer to command│

And after nothing

tygerlord avatar Jun 14 '21 11:06 tygerlord

I just performed a test against the code in the dev branch which is supposed to implement the reconnect feature.

2021-06-14 14:51:37 DEBUG (MainThread) [zigpy_zigate.uart] Closed serial connection

but no reconnection attempt was made by the lib. I had to force integration reload to have the connection back:

2021-06-14 14:55:06 DEBUG (MainThread) [zigpy_zigate.uart] Connection made

ZeGuigui avatar Jun 14 '21 13:06 ZeGuigui

I added some debug code into the lib, just to be sure.

    def connection_lost(self, exc) -> None:
        """Port was closed expecteddly or unexpectedly."""
        if self._connected_future and not self._connected_future.done():
            if exc is None:
                LOGGER.info("Connection lost - no exception");
                self._connected_future.set_result(True)
            else:
                LOGGER.info("Connection lost with exception %s", exc);
                self._connected_future.set_exception(exc)
        if exc is None:
            LOGGER.debug("Closed serial connection")
            return

        LOGGER.error("Lost serial connection: %s", exc)
        self._api.connection_lost(exc)

The "connection lost" messages I added are never called when we hit a disconnect. We might need to examine zigpy implementation in details to understand if this code (which is common to xbee, combee...) do work with other devices or if there is just a duplicated bug. Connection lost with USB devices is probably a rare issue

ZeGuigui avatar Jun 14 '21 14:06 ZeGuigui

Like it seems to come after 'No response to command 0x0530', that is to say, a 'hardware' crash, this is probably common to xbee, combee and other but they don't crash so we don't see the same trouble.

I think we are dealing with 2 bugs. To check.

tygerlord avatar Jun 16 '21 07:06 tygerlord

Like it seems to come after 'No response to command 0x0530', that is to say, a 'hardware' crash, this is probably common to xbee, combee and other but they don't crash so we don't see the same trouble.

I think we are dealing with 2 bugs. To check.

For me it's a global connection problem. After a delay, connection crash and we have lot of errors.

Depending on the command that we send we have a different return. But the basic problem is the connection

baimard avatar Jun 16 '21 07:06 baimard

My 2 cents, don't know if this is related, but in Wifi, the plugin needs to maintain the IP connection, otherwise after a Timeout ( 4 or 5 minutes) the line is droped

pipiche38 avatar Jun 16 '21 07:06 pipiche38

is it possible to add simply a "ping" or other every 30s to maintain the connection ?

baimard avatar Jun 16 '21 07:06 baimard

Like it seems to come after 'No response to command 0x0530', that is to say, a 'hardware' crash, this is probably common to xbee, combee and other but they don't crash so we don't see the same trouble. I think we are dealing with 2 bugs. To check.

For me it's a global connection problem. After a delay, connection crash and we have lot of errors.

Depending on the command that we send we have a different return. But the basic problem is the connection

Do other Zigbee radio modules have the same troubles ?

tygerlord avatar Jun 16 '21 08:06 tygerlord

Like it seems to come after 'No response to command 0x0530', that is to say, a 'hardware' crash, this is probably common to xbee, combee and other but they don't crash so we don't see the same trouble. I think we are dealing with 2 bugs. To check.

For me it's a global connection problem. After a delay, connection crash and we have lot of errors. Depending on the command that we send we have a different return. But the basic problem is the connection

Do other Zigbee radio modules have the same troubles ?

that a very good question, with the title of this issue I don't know if other product owner read this thread

baimard avatar Jun 16 '21 08:06 baimard

Most other zigbee radio modules are USB modules

ZeGuigui avatar Jun 16 '21 08:06 ZeGuigui

I can confirm I'm affected with zigate on usb. For now I have rollback to HASS version 2021.3.0 (docker install)

letroll avatar Jun 16 '21 08:06 letroll

deCONZ radios have much problems with USB disconnections and comports is not mapped OK in docker (in native deCONZ and ZHA).

Tasmota, and ESPHome that using ESP chips looks working OK with EFR32 and TI chips also tuya ZBGW is working OK with EFR32 radios over Ethernet.

EFR32 / bellows is going in fault state if the counter is going of sync then loosing packages in the WiFi / Eth network but is nicely logged in the error log.

MattWestb avatar Jun 16 '21 08:06 MattWestb

I can confirm I'm affected with zigate on usb. For now I have rollback to HASS version 2021.3.0 (docker install)

Interesting, can you confirm that you don't use any wifi between zigate/usb and your home assistant ?

tygerlord avatar Jun 16 '21 09:06 tygerlord

I confirm. I'm using Hass with docker on a NUC running archlinux. I don't know if it can influence but my USB Zigate is on a USB hub. I can try without tonight if you want?

letroll avatar Jun 16 '21 09:06 letroll

I confirm. I'm using Hass with docker on a NUC running archlinux. I don't know if it can influence but my USB Zigate is on a USB hub. I can try without tonight if you want?

Thank you, I don't think that it's necessary. The main point is that wifi seem not be the cause. As far as I'm remember when I put my PiZigate directly on my raspberry pi3, I already have 'Devices Offline' disconnects on my zigbee2mqtt, so this seems to exonerate wifi

tygerlord avatar Jun 16 '21 09:06 tygerlord

is it possible to add simply a "ping" or other every 30s to maintain the connection ?

I need to check logs, but I think that plugin already ask periodically for state of sensors...

tygerlord avatar Jun 16 '21 09:06 tygerlord

2021-06-13 21:04:43 WARNING (MainThread) [zigpy_zigate.api] No response to command 0x0530                                                               ┤
│2021-06-13 21:04:43 WARNING (MainThread) [zigpy_zigate.api] Retry command 0x0530                                                                        ┤
│2021-06-13 21:04:43 DEBUG (MainThread) [zigpy_zigate.uart] Send: 0x0530 b'02a6e10101000001040000050057000400'                                           ┤
│2021-06-13 21:04:43 DEBUG (MainThread) [zigpy_zigate.uart] Frame to send: b'\x050\x00\x112\x02\xa6\xe1\x01\x01\x00\x00\x01\x04\x00\x00\x05\x00W\x00\x04\┤
│2021-06-13 21:04:43 DEBUG (MainThread) [zigpy_zigate.uart] Frame escaped: b'\x02\x150\x02\x10\x112\x02\x12\xa6\xe1\x02\x11\x02\x11\x02\x10\x02\x10\x02\x┤
│2021-06-13 21:04:44 WARNING (MainThread) [zigpy_zigate.api] No response to command 0x0530                                                               ┤
│2021-06-13 21:04:44 DEBUG (MainThread) [zigpy.device] [0xa6e1] Delivery error for seq # 0x57, on endpoint id 1 cluster 0x0000: ZiGate doesn't answer to command│

And after nothing

To my mind, after Retry command, if module don't answer again then we should initialize connection like we do on start. What do you think about?

tygerlord avatar Jun 16 '21 12:06 tygerlord

Citation de pipiche le 16 juin 2021, 15 h 27 min
Domoticz ping régulièrement la ZiGate (si il n'y a pas eu de communication In et Out depuis un laps de temps)
Si on ne reçoit pas de réponse au ping, on va insister
Si malgré les retry pas de réponse, alors le plugin est relancé avec un restart de la ZiGate

This seems a good process.

tygerlord avatar Jun 16 '21 13:06 tygerlord

So the problem is in HASS core ?

baimard avatar Jun 16 '21 14:06 baimard

Citation de pipiche le 16 juin 2021, 15 h 27 min
Domoticz ping régulièrement la ZiGate (si il n'y a pas eu de communication In et Out depuis un laps de temps)
Si on ne reçoit pas de réponse au ping, on va insister
Si malgré les retry pas de réponse, alors le plugin est relancé avec un restart de la ZiGate

This seems a good process. To be complet, Wifi module (as per their documentation and specifications) has a Timeout and this is up to the client to keep the line live!

Yes, but as I can see in logs, we have a 0x0530 exchange every minute. (more over this seem to happen with usb module too that are not concern about wifi timeout)

tygerlord avatar Jun 16 '21 14:06 tygerlord