tellstick-plugin-mqtt-hass icon indicating copy to clipboard operation
tellstick-plugin-mqtt-hass copied to clipboard

Does not re-connect when MQTT broker restarts

Open crashmatt opened this issue 3 years ago • 22 comments

Broker connection is broken after a restart.

Tested reconnect by entering configuration, touching a field and then saving status.
Touch may not be required. This is not tested. Reboot may make a reconnect. This is not tested.

crashmatt avatar Dec 04 '21 18:12 crashmatt

I suspect a lua script might act as a watchdog for this service.

class Client(plugin has methods that seem like they should be exposed to lua. There is no method I can find to plainly expose the connection state. Detecting if it is connected may be difficult.

There is no mechanism I can find in client.py that watches over the connection state.

I would make adjustments to the plugin myself but I have had no success building a good build environment for these plugins.

crashmatt avatar Dec 04 '21 18:12 crashmatt

Uhm lua ? You mean we should add a separate lua plugin to watch this ? There should not be a need for a watchdog for the mqtt connection, the paho client should reconnect by itself according to the documentation. But i have just noticed this problem myself, if the connection is lost it does not reconnect. I'll have a look what i can do when i get some free time.

quazzie avatar Dec 04 '21 18:12 quazzie

Yes. Run a small lua script on a timer to check for connection and restart if required.

This lua script interacts with your plugin to send the mqtt debug message.

The only flaw in this plan is that Client does not have the correct methods.

-- hass mqtt plugin has to be installed local mqtt = nil

function onInit() mqtt = require 'HASSMQTT.Client' if mqtt == nil then print "No mqtt client" else print "mqtt client ok" end end

function onDeviceStateChanged(device, state, stateValue) dev_name = device:name() if dev_name == "ClockTick" then print "sent debug message" mqtt:_debug("bahhh") end end

crashmatt avatar Dec 04 '21 21:12 crashmatt

but fixing the behavior in paho would be better

crashmatt avatar Dec 04 '21 21:12 crashmatt

A lua script fix while paho is broken.

  1. A fake Nexa switch device is added as "HassTellstickMQTTWatchdog"
  2. A Hass automation sets this device through MQTT once every 10 seconds
  3. The lua script checks if the watchdog event has happened with 30s minimum interval
  4. If the watchdog signal is not received then lua sets the "hostname" of the mqtt client. This results in a disconnect-connect started from here. I did not find a better way to do disconnect-connect.
-- hass mqtt plugin has to be installed
local mqtt = require 'HASSMQTT.Client'
local deviceManager = require "telldus.DeviceManager"	
local running_timer = false
local watchdog_count = 0
local watchdog_timeout_seconds = 30 -- Delay in minutes

function init()
	if mqtt == nil then
		print "No mqtt client"
	else
		print "mqtt client ok"
	end
end

function onInit()
	init()
end

function onDeviceStateChanged(device, state, stateValue)
	if mqtt == nil then
		return
	end
	
	dev_name = device:name()
	if dev_name == "HassTellsickMQTTWatchdog" then
		if device:state() == 1 then
			watchdog_count = watchdog_count + 1
			print "HassTellstickMQTTWatchdog signal received"
		end
	end
	
	if not running_timer then
		running_timer = true
		watchdog_count = 0
		sleep(watchdog_timeout_seconds*1000)
		
		if watchdog_count == 0 then
			print("HassTellstickMQTTWatchdog timeout")
			mqtt:configWasUpdated('hostname', '<HASS_ADDRESS>')
		else
			print("HassTellstickMQTTWatchdog count %u", watchdog_count)
		end
		running_timer = false
	end
end

crashmatt avatar Dec 06 '21 20:12 crashmatt

Looks interesting. Could you share also the HA automation part?

henripalmroth avatar Dec 10 '21 21:12 henripalmroth

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet. @crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second and the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

pierrebengtsson avatar Jan 17 '22 12:01 pierrebengtsson

I have modified the lua a bit since I last posted. It also relies on a "ClockTick" device set once a minute. We can probably find a better solution to that.

The HA script is simple. It just sets a switch every 1/10 second. You need to create this "virtual" watchdog switch for Tellstick by creating a Nexa switch and then not assigning to a real switch.

alias: TellstickPingRepeatdescription: ''trigger: - platform: time_pattern seconds: /10condition: []action: - type: turn_on device_id: 702668760f977dea1be89b83a55adfb0 entity_id: switch.hasstellsickmqttwatchdog domain: switchmode: single

On Mon, 17 Jan 2022 at 13:11, pierrebengtsson @.***> wrote:

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet. @crashmatt https://github.com/crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1014454548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHIW4V2C35JWIL6XNGTUWQBPZANCNFSM5JL2UGIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

crashmatt avatar Jan 17 '22 16:01 crashmatt

Important update. The watchdog from hass is sent to the tellstick. What I didn't know before is that the tellstick transmitts the virtual switch code.

I am using a nexa switch as a virtual device. Since the watchdog transmits every 10seconds, when you retrain a physical Nexa switch it is likely to capture the watchdog also. This has been driving me crazy for a few months of light switches always switching themselves back on.

Solution is to always have the watchdog send an off state. That way a switch that is training always unlearn the watchdog. Another solution might be to pick a device on a different protocol. HAve not tested this yet.

More and patchwork required...

On Mon, 17 Jan 2022 at 17:35, matthew coleman @.***> wrote:

I have modified the lua a bit since I last posted. It also relies on a "ClockTick" device set once a minute. We can probably find a better solution to that.

The HA script is simple. It just sets a switch every 1/10 second. You need to create this "virtual" watchdog switch for Tellstick by creating a Nexa switch and then not assigning to a real switch.

alias: TellstickPingRepeatdescription: ''trigger: - platform: time_pattern seconds: /10condition: []action: - type: turn_on device_id: 702668760f977dea1be89b83a55adfb0 entity_id: switch.hasstellsickmqttwatchdog domain: switchmode: single

On Mon, 17 Jan 2022 at 13:11, pierrebengtsson @.***> wrote:

I´m experiencing the same issue. In my case we have a lot of power outages at the winter and when my znet starts up before my HA instance the MQTT connection fails and won´t reconnect until i powercykle my znet. @crashmatt https://github.com/crashmatt could you share youre HA-automation? If i create an automation that sets the fictional device to "on" every 10 second the only message I get from the LUA-script is "HassTellstickMQTTWatchdog timeout"

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1014454548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHIW4V2C35JWIL6XNGTUWQBPZANCNFSM5JL2UGIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

crashmatt avatar Feb 04 '22 14:02 crashmatt

I've had this issue for quite some time, don't know a good solution though.

https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/9

It would be great if @crashmatt could share the lua and ha config (with formatting) for a watchdog.

fredrike avatar Apr 16 '23 20:04 fredrike

Fredrik, There are a few different parts to this.

  1. A virtual switch item on tellstick to receive the heartbeats
  2. A heartbeat automation from hass to tellstick so the tellstick knows it is healthy
  3. A watchdog monitor on tellstick to reboot if the heartbeat is not ok

Do this in order. Check the heartbeats are being received at the tellstickbefore adding the watchdog monitor. Otherwise your tellstick will be continuously rebooting and you are in for a frustrating day.

Step 1 [image: image.png]

Let me know if you need more guidance and I will attempt to document it better

On Sun, 16 Apr 2023 at 22:29, Fredrik Erlandsson @.***> wrote:

I've had this issue for quite some time, don't know a good solution though.

#9 https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/9

It would be great if @crashmatt https://github.com/crashmatt could share the lua and ha config (with formatting) for a watchdog.

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1510478831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHNW6G23NAZ3HXISLI3XBRJDXANCNFSM5JL2UGIQ . You are receiving this because you were mentioned.Message ID: @.***>

crashmatt avatar Apr 17 '23 07:04 crashmatt

Might have a look at my comments on the mentioned ticket. Maybe that's a thing?

tiehfood avatar Apr 21 '23 14:04 tiehfood

Here are my current configurations.

  1. Created a switch in Telldus Live (called MQTT-watchdog)
  2. Changed id for the new switch to switch.tellstick_mqtt_watchdog in HA
  3. Built the following automation in HA:
    alias: HassTellstickMQTTWatchdog
    description: ""
    trigger:
      - platform: time_pattern
        seconds: /10
    condition: []
    action:
      - service: switch.turn_on
        data: {}
        target:
          entity_id: switch.tellstick_mqtt_watchdog
    mode: single
    
  4. Built the following Lua script on my Telldus TellStick (accessed trough the local IP):
    -- hass mqtt plugin has to be installed
    local mqtt = require 'HASSMQTT.Client'
    local deviceManager = require "telldus.DeviceManager"
    local running_timer = false
    local watchdog_count = 0
    local watchdog_timeout_seconds = 30 -- Delay in seconds
    
    function init()
       if mqtt == nil then
          print "No mqtt client"
       else
          print "mqtt client ok"
       end
    end
    
    function onInit()
       init()
    end
    
    function onDeviceStateChanged(device, state, stateValue)
       if mqtt == nil then
          return
       end
    
       dev_name = device:name()
       if dev_name == "HassTellsickMQTTWatchdog" then
          if device:state() == 1 then
             watchdog_count = watchdog_count + 1
             print "HassTellstickMQTTWatchdog signal received"
          end
       end
    
       if not running_timer then
          running_timer = true
          watchdog_count = 0
          sleep(watchdog_timeout_seconds*1000)
    
          if watchdog_count == 0 then
             print("HassTellstickMQTTWatchdog timeout")
             mqtt:connect()
          else
             print("HassTellstickMQTTWatchdog count %u", watchdog_count)
          end
          running_timer = false
       end
    end
    

I've not had any issues with MQTT since I started running this, but I can't say that it is just because of this script (I might be lucky too).

fredrike avatar Apr 24 '23 19:04 fredrike

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 3591, in _thread_main
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 1779, in loop_forever
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 1044, in reconnect
  File "build/bdist.linux-x86_64/egg/paho/mqtt/client.py", line 3685, in _create_socket_connection
  File "/usr/lib/python2.7/socket.py", line 575, in create_connection
    raise err
timeout: timed out

This is the error which paho throws, if the mqtt server is restarted or shut down

same problem here: eclipse/paho.mqtt.python#636

tiehfood avatar Apr 25 '23 14:04 tiehfood

It will be interesting to see how stable your system is. I don't know how finely this setup is.

A bad MQTT connection can be faked by stopping the watchdog transmit. I did this a few times and checked the system came back together. Sometimes takes a while to heal and become stable again.

/Matt

On Mon, 24 Apr 2023 at 21:55, Fredrik Erlandsson @.***> wrote:

Here are my current configurations.

  1. Created a switch in Telldus Live (called MQTT-watchdog)
  2. Changed id for the new switch to switch.tellstick_mqtt_watchdog in HA
  3. Built the following automation in HA:

alias: HassTellstickMQTTWatchdogdescription: ""trigger: - platform: time_pattern seconds: /10condition: []action: - service: switch.turn_on data: {} target: entity_id: switch.tellstick_mqtt_watchdogmode: single

  1. Built the following Lua script on my Telldus TellStick (accessed trough the local IP):

-- hass mqtt plugin has to be installedlocal mqtt = require 'HASSMQTT.Client'local deviceManager = require "telldus.DeviceManager"local running_timer = falselocal watchdog_count = 0local watchdog_timeout_seconds = 30 -- Delay in seconds function init() if mqtt == nil then print "No mqtt client" else print "mqtt client ok" endend function onInit() init()end function onDeviceStateChanged(device, state, stateValue) if mqtt == nil then return end

  dev_name = device:name()
  if dev_name == "HassTellsickMQTTWatchdog" then
     if device:state() == 1 then
        watchdog_count = watchdog_count + 1
        print "HassTellstickMQTTWatchdog signal received"
     end
  end

  if not running_timer then
     running_timer = true
     watchdog_count = 0
     sleep(watchdog_timeout_seconds*1000)

     if watchdog_count == 0 then
        print("HassTellstickMQTTWatchdog timeout")
        mqtt:connect()
     else
        print("HassTellstickMQTTWatchdog count %u", watchdog_count)
     end
     running_timer = false
  endend

I've not had any issues with MQTT since I started running this, but I can't say that it is just because of this script (I might be lucky too).

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1520743711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHJ3DQ62FOF5LFMYCFDXC3LCJANCNFSM5JL2UGIQ . You are receiving this because you were mentioned.Message ID: @.***>

crashmatt avatar Apr 25 '23 17:04 crashmatt

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip @crashmatt , @fredrike you may want to try this version. It seems that the reconnect is working better with paho <1.6.0. So this is just the current version 0.90.4 repacked with the paho 1.5.1 from version 0.90.0. For me this is far more stable on reconnects and no exception is thrown so far.

p.s. the files are signed and unmodified from this repo, otherwise it would not be possible to load them in telldus. So you might trust the content of the ZIP 😉

tiehfood avatar Apr 25 '23 21:04 tiehfood

I have no idea how to use those files. I presume they modify the tellstick plugin?

On Tue, 25 Apr 2023 at 23:57, tiehfood @.***> wrote:

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip https://github.com/quazzie/tellstick-plugin-mqtt-hass/files/11327122/MQTT_Homeassistant-0.90.4_paho-1.5.1.zip @crashmatt https://github.com/crashmatt , @fredrike https://github.com/fredrike you may want to try this version. It seems that the reconnect is working better with paho <1.6.0. So this is just the current version 0.90.4 repacked with the paho 1.5.1 from version 0.90.0. For me this is far more stable on reconnects and no exception is thrown so far.

— Reply to this email directly, view it on GitHub https://github.com/quazzie/tellstick-plugin-mqtt-hass/issues/30#issuecomment-1522471284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACVYHLJWRHVW6EKXPJU2KLXDBCFFANCNFSM5JL2UGIQ . You are receiving this because you were mentioned.Message ID: @.***>

crashmatt avatar Apr 26 '23 05:04 crashmatt

Just install the zip file as a plug-in (don't extract). As you do it with the official plugin from the releases page. And yes, it just replaces the paho version (from 1.6.1 down to 1.5.1)

tiehfood avatar Apr 26 '23 07:04 tiehfood

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip

This seems to be working well. I installed this and tried a couple of times restarting my mqtt server and power cycling my network switch and the mqtt connection was restored correctly.

sampod avatar Apr 26 '23 14:04 sampod

MQTT_Homeassistant-0.90.4_paho-1.5.1.zip

This seems to be working well. I installed this and tried a couple of times restarting my mqtt server and power cycling my network switch and the mqtt connection was restored correctly.

I forgot to check what version I had first, but tried the zip-file, my problem still persists. Lately the addon have disconnected just seconds after connecting, making the znet dumb as f**k

Going to try the lua script now, fingers crossed X

hauard avatar Dec 06 '23 19:12 hauard

Looks like the client disconnects just seconds after connecting either way. Made myself a virtual switch that a lua listens to and connects the MQTT, making it easier to investigate. Earlier I had to login and bump the addon by removing and adding a number in the config in the addon

I have several other clients that connects to the broker without issues. Tried increasing and decreasing the keepalive ping on the broker, but no luck.

Is there any way to enable logging on the znet? To see whats going on there

hauard avatar Dec 06 '23 19:12 hauard

My problems with disconnects only happens after a while. After years with issues where I have had to restart the znet manually from time to time, I’ve now connected it to a power switch that I automatically power cycle every night. Now my setup is finally stable 🙈

grEvenX avatar Dec 07 '23 07:12 grEvenX