mqtt-io icon indicating copy to clipboard operation
mqtt-io copied to clipboard

Still not reconnecting when MQTT broker not available

Open joopmartens opened this issue 1 year ago • 7 comments

Describe the bug When MQTT-IO is loosing connection to the broker it is not trying to reconnect until a input pin state changes.

Also when reconnecting triggered by a pin state change the first MQTT message (pin state change) is not delivered to the MQTT broker in the re-connection process and only the second state change is published.

Expected behavior Test the connection and re-establish the connection based on the keepalive configuration parameter. This way it will also reconnect when no pin state changes are occurring so that it allows output pin changes based on MQTT messages from other clients.

Also publish the first message of a input pin state change after a disconnect has been detected by the MQTT publish process and a re-connection has taken place.

Error messages and traceback

2022-08-30 21:14:25 mqtt_io.server [ERROR] Exception in critical task:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/mqtt_io/mqtt/asyncio_mqtt.py", line 32, in inner
    await func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/mqtt_io/mqtt/asyncio_mqtt.py", line 94, in publish
    await self._client.publish(
  File "/usr/local/lib/python3.9/dist-packages/asyncio_mqtt/client.py", line 128, in publish
    raise MqttCodeError(info.rc, 'Could not publish message')
asyncio_mqtt.error.MqttCodeError: [code:4] Could not publish message

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/mqtt_io/server.py", line 1176, in _main_loop
    await asyncio.gather(*self.critical_tasks)
  File "/usr/local/lib/python3.9/dist-packages/mqtt_io/server.py", line 1032, in _mqtt_task_loop
    await entry.coro
  File "/usr/local/lib/python3.9/dist-packages/mqtt_io/server.py", line 671, in _mqtt_publish
    await self.mqtt.publish(msg)
  File "/usr/local/lib/python3.9/dist-packages/mqtt_io/mqtt/asyncio_mqtt.py", line 34, in inner
    raise MQTTException from exc
mqtt_io.mqtt.MQTTException

Config

mqtt:
  host: 192.168.1.X
  topic_prefix: rpi_uc
  user: XXXX
  password: XXXX
  client_id: rpi_uc
  keepalive: 10
  reconnect_delay: 2

# GPIO
gpio_modules:
  # Use the Raspberry Pi built-in GPIO
  - name: rpi
    module: raspberrypi

digital_inputs:
  # Pin 12 is input doorbell
  - name: gate_doorbell
    module: rpi
    pin: 12 
    pullup: true
    poll_interval: 0.2 

  # Pin 24 is input watermeter pulse 
  - name: water_meter 
    module: rpi
    pin: 24 
    pullup: true
    poll_interval: 0.2

digital_outputs:
  # Pin 17 is output to operate gate 
  - name: gate_operate
    module: rpi
    pin: 17
    initial: low
    publish_initial: true

  # Pin 18 is output gate light 
  - name: gate_light
    module: rpi
    pin: 18
    retain: true

Hardware

  • Platform: Raspberry Pi
  • Connected hardware: N/A

System:

  • mqtt-io version: 2.2.7
  • OS: Raspbian Bullseye
  • Python version: Python 3.9.2
  • User you're running as: root
  • Using a virtualenv?: no

Additional context Reconnect issue identical to last comments of closed issues: #207 and #187. Although these issues have been closed the issue does not seems to be fixed.

joopmartens avatar Aug 30 '22 19:08 joopmartens

I am seeing similar behavior. I am likewise only using output pins to control a set of relays. I was previously using the old pi_mqtt_gpio and it worked for years. Rebuilding required a reinstall and i only see this behavior on the new versions. I reverted back to 0.5.6 and it is working properly.

WritesWithBadCode avatar Mar 07 '23 16:03 WritesWithBadCode

I was also struggling with this (and also Python is not very nice to deploy with NixOS), so I rewrote a small subset of this project in Rust for my uscase (rpi3 gpio pins): https://github.com/chr4/rpi-mqtt-gpio - so far runs pretty stable and no reconnect issues. Tried to make the configs compatbile to this project.

chr4 avatar Mar 09 '23 12:03 chr4

I have exactly the same problem, with exactly the same error output. After a while, the only thing that works is restarting the systemd service for mqtt_io. The biggest problem is that you only discover that there is something wrong, when there is an event you missed (in my case someone standing in front of the door, with a doorbell that doesn't work anymore). I tried @chr4 project, but it only supports outputs and really lacks documentation. I'm hoping that someone can find a solution for this, I will do what I can to help but I'm really a novice with Python. Keep up the good work guys!

vincentkoevoets avatar Apr 16 '23 08:04 vincentkoevoets

Any updates about this issue?

a-reda avatar May 19 '23 18:05 a-reda

Not yet, but I'm going to set it up in my environment and try to reproduce and investigate it.

BenjiU avatar May 19 '23 18:05 BenjiU

I tried to investigate what happens at the level of MQTT messages sent (I am using only outputs). Basically what happens is that when the broker restarts, the status is published correctly, however the status of each output is not sent again. The latter causes the entities in HA to be in an "Unavailable state".

When mqtt-io is restarted though it published the outputs' status making HA picking them back again.

#243 seems related too

a-reda avatar May 20 '23 23:05 a-reda

Not yet, but I'm going to set it up in my environment and try to reproduce and investigate it.

What is supposed to trigger the reconnection in your code for this case ?

Keepalive is not enabled due to an old version of asyncio-mqtt and, for an output only setup, the daemon is not going to send any new message to the broker. If it lost the connection, when are you expecting the exception to be thrown ?

To reproduce, just create a config with outputs and then stop the MQTT broker. Connection lost is not detected and the last will message is indicating the process as dead until you restart it.

jmoutte avatar May 29 '23 20:05 jmoutte