docker-wyze-bridge icon indicating copy to clipboard operation
docker-wyze-bridge copied to clipboard

Retry MQTT Connection

Open rmaes4 opened this issue 1 year ago • 5 comments

Problem

I am using MQTT to communicate motion events with Scrypted (which is also acting as my MQTT broker). When I reboot my Raspberry Pi, Docker launches both a docker-wyze-bridge container and a scrypted container at the same time. This creates a race condition where docker-wyze-bridge attempts to connect to the scrypted MQTT broker before scrypted has finished initializing. Thus, the MQTT connection fails for docker-wyze-bridge. The problem is that docker-wyze-bridge does not re-attempt this connection, it just gives up. A retry mechanism is needed to handle a scenario like this or to cover cases where there may be a short loss of connection.

Potential Solution

https://github.com/mrlt8/docker-wyze-bridge/blob/0b7de5997ad90de5bb8bf47be89c9110e342ac54/app/wyzebridge/mqtt.py#L77-L92

I quickly looked at the source code and from what I can tell, this is where the MQTT connection is created. I also took a look at the documentation for the paho-mqtt library and found the below function:

RECONNECT_DELAY_SET

reconnect_delay_set(min_delay=1, max_delay=120)

The client will automatically retry connection. Between each attempt it will wait a number of seconds between min_delay and max_delay.

When the connection is lost, initially the reconnection attempt is delayed of min_delay seconds. It’s doubled between subsequent attempt up to max_delay.

The delay is reset to min_delay when the connection complete (e.g. the CONNACK is received, not just the TCP connection is established).

I believe that this issue could be easily solved by modifying the mqtt_sub_topic function to the following:

     @mqtt_enabled 
     def mqtt_sub_topic(m_topics: list, callback) -> Optional[paho.mqtt.client.Client]: 
         """Connect to mqtt and return the client.""" 
         client = paho.mqtt.client.Client() 
      
         client.username_pw_set(MQTT_USER, MQTT_PASS or None) 
         client.user_data_set(callback) 
         client.on_connect = lambda mq_client, *_: ( 
             mq_client.publish(f"{MQTT_TOPIC}/state", "online"), 
             [mq_client.subscribe(f"{MQTT_TOPIC}/{m_topic}") for m_topic in m_topics], 
         ) 
         client.will_set(f"{MQTT_TOPIC}/state", payload="offline", qos=1, retain=True) 

        """MQTT RECONNECT OPTION"""
         client.reconnect_delay_set(min_delay=1, max_delay=120)
        """MQTT RECONNECT OPTION"""

         client.connect(MQTT_HOST, int(MQTT_PORT or 1883), 30) 
         client.loop_start() 
      
         return client 

I would test this myself and create a PR, but I don't have this project setup for local development. I am hoping this simple change can resolve this issue.

rmaes4 avatar Nov 21 '23 21:11 rmaes4

I believe that would only work if the connection is lost, and paho throws an exception if the broker is not up yet - probably a [Errno 111] Connection refused...?

I've added a retry option to the wrapper that defaults to 3 attempts before disabling MQTT but should be configurable MQTT_RETRIES if you need more attempts.

mrlt8 avatar Dec 03 '23 04:12 mrlt8

@mrlt8 is there any specific setting for not limiting the mqtt connection retries, or should I just put a stupidly large value in the MQTT_RETRIES variable?

Because of nightly router restarts I run into the retries expiring:

[WyzeBridge] ⏰ Timed out connecting to ovalesublime-west-cam.
[WyzeBridge] [MQTT] [Errno 101] Network is unreachable
[ovalesublime-south-cam] [-13] IOTC_ER_TIMEOUT
[ovalesublime-west-cam] [-13] IOTC_ER_TIMEOUT
[ovalesublime-south-cam] [MQTT] [Errno 101] Network is unreachable
[ovalesublime-west-cam] [MQTT] timed out. Retrying 1/3...
[WyzeBridge] [MQTT] timed out. Retrying 2/3...
[ovalesublime-south-cam] [MQTT] timed out. Retrying 2/3...
[ovalesublime-west-cam] [MQTT] [Errno 101] Network is unreachable
[WyzeBridge] [MQTT] timed out. Retrying 3/3...
[ovalesublime-west-cam] [MQTT] timed out. Retrying 3/3...
[ovalesublime-south-cam] [MQTT] timed out. Retrying 3/3...
[WyzeBridge] [MQTT] 3/3 retries failed. Disabling MQTT.
[WyzeBridge] ⏰ Timed out connecting to ovalesublime-south-cam.
[WyzeBridge] 🎉 Connecting to WyzeCam V3 - ovalesublime-west-cam on 192.168.1.88

and for something that is an add-on / service, I believe it is more useful to never give up..

teixeluis avatar Dec 11 '23 11:12 teixeluis

Good point. Will see if we can keep retrying for certain exceptions.

mrlt8 avatar Dec 11 '23 14:12 mrlt8

I tried even with a extremely large number of retries, and no luck. Every time I restart home assistant, I have to restart the docker bridge (I'm running them on separate platforms) to get MQTT to work again. It happens with cameras that are turned off, if I turn them on again via the Wyze app, then they start working.

cfelicio avatar Feb 27 '24 22:02 cfelicio

+100

I think there should really be a separate thread or process that keeps up the connection. After all, what happens if the MQTT connection is reset during the normal (mid) course of the container execution? Even if we set retries high, what if it still fails to connect? And why can't it just keep re-trying in parallel with the master process? For a service that is so essential as MQTT**, I would expect such redundancies in its implementation in Docker Wyze Bridge.

Ironically, yes, I'm a software developer, even open source at time, but unfortunately my time is so limited and I can respect that you (@mrlt8) don't have time to do such things, especially in its initial implementation. ;) So no judgement, just expression of importance to some of us.

** RE: MQTT Essentiality: "will and last testament" as defined by the MQTT spec, which even sounds mellow dramatic itself, implies a certain level of categorical importance of survival (for MQTT devices), none of which becomes less important with Docker Wyze Bridge, only more important due to safety and security being at-risk (in theory or in practice).

nbetcher avatar Aug 03 '24 14:08 nbetcher