zyre icon indicating copy to clipboard operation
zyre copied to clipboard

endpoints fail to reconnect after wifi reconnection

Open snyderra opened this issue 3 years ago • 3 comments
trafficstars

Using ZYRE on android phones with the latest build from the releases. Its working great except after the wifi connection drops and reconnects. It seems that the remote connection does not know that the connection was lost??? also worth noting, the name of the node is lost. It was set to the last four of the device serial. Any help or workarounds appreciated. Would zmq heartbeats help? Not quite sure how to enable and test. Any help or workarounds appreciated.

PHONE 1 Log (this one lost wifi for ~5 seconds)

zbeacon: failed to transmit, attempting reconnection. reason=invalid socket 03-02 17:54:18.551 18686 19027 D zsys : zbeacon: failed to transmit, attempting reconnection. reason=invalid socket 03-02 17:54:19.563 18686 19027 D zsys : zbeacon: failed to transmit, attempting reconnection. reason=invalid socket 03-02 17:54:20.565 18686 19027 D zsys : zbeacon: failed to transmit, attempting reconnection. reason=invalid socket 03-02 17:54:20.565 18686 19027 I zsys : zbeacon: interface=wlan0 address=10.9.8.122 broadcast=10.9.8.255 03-02 17:54:20.566 18686 19027 I zsys : zbeacon: configured, hostname=10.9.8.122 03-02 17:54:21.125 18686 19026 I zsys : (N42B) connect to peer: endpoint=tcp://10.9.8.127:49152 03-02 17:54:21.125 18686 19026 I zsys : (N42B) send HELLO to peer=- sequence=1 03-02 17:54:27.032 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:27.032 18686 19026 I zsys : (N42B) send PING to peer=- sequence=2 03-02 17:54:27.033 18686 19011 I zsys : zyre_event: 03-02 17:54:27.033 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:27.033 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:27.033 18686 19011 I zsys : zyre_event: 03-02 17:54:27.033 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:27.033 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:28.033 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:28.033 18686 19026 I zsys : (N42B) send PING to peer=- sequence=3 03-02 17:54:28.034 18686 19026 I zsys : (N42B) peer '' has not answered ping after 1000 milliseconds (silent) 03-02 17:54:28.034 18686 19011 I zsys : zyre_event: 03-02 17:54:28.034 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:28.034 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:28.034 18686 19011 I zsys : zyre_event: 03-02 17:54:28.034 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:28.034 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:28.034 18686 19011 I zsys : zyre_event: 03-02 17:54:28.034 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:28.034 18686 19011 I zsys : - type=SILENT 03-02 17:54:28.034 18686 19011 I zsys : zyre_event: 03-02 17:54:28.034 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:28.035 18686 19011 I zsys : - type=SILENT 03-02 17:54:29.034 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:29.035 18686 19026 I zsys : (N42B) send PING to peer=- sequence=4 03-02 17:54:29.035 18686 19026 I zsys : (N42B) peer '' has not answered ping after 1000 milliseconds (silent) 03-02 17:54:29.035 18686 19011 I zsys : zyre_event: 03-02 17:54:29.035 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:29.035 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:29.035 18686 19011 I zsys : zyre_event: 03-02 17:54:29.035 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:29.035 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:29.035 18686 19011 I zsys : zyre_event: 03-02 17:54:29.035 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:29.035 18686 19011 I zsys : - type=SILENT 03-02 17:54:29.035 18686 19011 I zsys : zyre_event: 03-02 17:54:29.036 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:29.036 18686 19011 I zsys : - type=SILENT 03-02 17:54:30.035 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:30.035 18686 19026 I zsys : (N42B) send PING to peer=- sequence=5 03-02 17:54:30.035 18686 19026 I zsys : (N42B) peer '' has not answered ping after 1000 milliseconds (silent) 03-02 17:54:30.035 18686 19011 I zsys : zyre_event: 03-02 17:54:30.035 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:30.035 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:30.035 18686 19011 I zsys : zyre_event: 03-02 17:54:30.035 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:30.035 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:30.035 18686 19011 I zsys : zyre_event: 03-02 17:54:30.035 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:30.036 18686 19011 I zsys : - type=SILENT 03-02 17:54:30.036 18686 19011 I zsys : zyre_event: 03-02 17:54:30.036 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:30.036 18686 19011 I zsys : - type=SILENT 03-02 17:54:31.035 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:31.036 18686 19026 I zsys : (N42B) send PING to peer=- sequence=6 03-02 17:54:31.036 18686 19026 I zsys : (N42B) peer '' has not answered ping after 1000 milliseconds (silent) 03-02 17:54:31.036 18686 19011 I zsys : zyre_event: 03-02 17:54:31.036 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:31.036 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:31.036 18686 19011 I zsys : zyre_event: 03-02 17:54:31.037 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:31.037 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:31.037 18686 19011 I zsys : zyre_event: 03-02 17:54:31.037 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:31.037 18686 19011 I zsys : - type=SILENT 03-02 17:54:31.037 18686 19011 I zsys : zyre_event: 03-02 17:54:31.037 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:31.037 18686 19011 I zsys : - type=SILENT 03-02 17:54:32.036 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:32.037 18686 19026 I zsys : (N42B) send PING to peer=- sequence=7 03-02 17:54:32.037 18686 19026 I zsys : (N42B) peer '' has not answered ping after 1000 milliseconds (silent) 03-02 17:54:32.037 18686 19011 I zsys : zyre_event: 03-02 17:54:32.037 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:32.037 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:32.037 18686 19011 I zsys : zyre_event: 03-02 17:54:32.037 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:32.037 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:32.038 18686 19011 I zsys : zyre_event: 03-02 17:54:32.038 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:32.038 18686 19011 I zsys : - type=SILENT 03-02 17:54:32.038 18686 19011 I zsys : zyre_event: 03-02 17:54:32.038 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:32.038 18686 19011 I zsys : - type=SILENT 03-02 17:54:33.037 18686 19026 I zsys : (N42B) peer does not send messages (evasive) name= endpoint=tcp://10.9.8.127:49152 03-02 17:54:33.037 18686 19026 I zsys : (N42B) send PING to peer=- sequence=8 03-02 17:54:33.038 18686 19026 I zsys : (N42B) peer '' has not answered ping after 1000 milliseconds (silent) 03-02 17:54:33.040 18686 19011 I zsys : zyre_event: 03-02 17:54:33.040 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:33.040 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:33.040 18686 19011 I zsys : zyre_event: 03-02 17:54:33.040 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:33.040 18686 19011 I zsys : - type=EVASIVE 03-02 17:54:33.041 18686 19011 I zsys : zyre_event: 03-02 17:54:33.041 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:33.041 18686 19011 I zsys : - type=SILENT 03-02 17:54:33.041 18686 19011 I zsys : zyre_event: 03-02 17:54:33.041 18686 19011 I zsys : - from name= uuid=577D94087B82870C6B04C71AB95361D2 03-02 17:54:33.041 18686 19011 I zsys : - type=SILENT

PHONE 2 Log

03-02 17:54:26.405 32503 4106 I zsys : (YK7R) peer does not send messages (evasive) name=N42B endpoint=tcp://10.9.8.122:49152 03-02 17:54:26.405 32503 4106 I zsys : (YK7R) send PING to peer=N42B sequence=2 03-02 17:54:26.405 32503 4105 I zsys : zyre_event: 03-02 17:54:26.406 32503 4105 I zsys : - from name=N42B uuid=97B49BCFFD337455082C93B2A4F71099 03-02 17:54:26.406 32503 4105 I zsys : - type=EVASIVE 03-02 17:54:26.406 32503 4105 I zsys : zyre_event: 03-02 17:54:26.406 32503 4105 I zsys : - from name=N42B uuid=97B49BCFFD337455082C93B2A4F71099 03-02 17:54:26.406 32503 4105 I zsys : - type=EVASIVE 03-02 17:54:26.984 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=2 03-02 17:54:26.984 32503 4106 I zsys : (YK7R) send PING_OK to peer=N42B sequence=3 03-02 17:54:27.985 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=3 03-02 17:54:27.986 32503 4106 I zsys : (YK7R) send PING_OK to peer=N42B sequence=4 03-02 17:54:29.756 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=4 03-02 17:54:29.756 32503 4106 I zsys : (YK7R) send PING_OK to peer=N42B sequence=5 03-02 17:54:30.105 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=5 03-02 17:54:30.105 32503 4106 I zsys : (YK7R) send PING_OK to peer=N42B sequence=6 03-02 17:54:30.987 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=6 03-02 17:54:30.987 32503 4106 I zsys : (YK7R) send PING_OK to peer=N42B sequence=7 03-02 17:54:31.993 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=7 03-02 17:54:31.993 32503 4106 I zsys : (YK7R) send PING_OK to peer=N42B sequence=8 03-02 17:54:33.014 32503 4106 I zsys : (YK7R) recv PING from peer=N42B sequence=8

snyderra avatar Mar 02 '22 23:03 snyderra

It seems indeed. Your host has gotten a new ipaddress it seems? Which confuses the remote host

sphaero avatar Apr 12 '22 18:04 sphaero

It reconnects with the same ip. Zyre never reestablishes the connection even after waiting for the evasive timeout. Only thing that works is to detect when the name is empty and restart the zyre context. Before using zyre, I was using just ZMQ classes, I was able to enable ZMQ heartbeat and it worked.

Is there anything I could develop that would help find the bug? Really would like zyre to reconnect when wireless is lost then reconnected. On Apr 12, 2022, 14:42 -0400, Arnaud Loonstra @.***>, wrote:

It seems indeed. Your host has gotten a new ipaddress it seems? Which confuses the remote host — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

snyderra avatar Apr 12 '22 22:04 snyderra

This requires some tracing of the logic. The code is not hard to read, you could try the require_peer method see if it is caused there: https://github.com/zeromq/zyre/blob/8f4a02fb6ef87c9af5586a4b1489eaf595e6ae09/src/zyre_node.c#L824

It would also help to create a code snippet that reproduces the problem

sphaero avatar Apr 13 '22 19:04 sphaero