paho.mqtt.python icon indicating copy to clipboard operation
paho.mqtt.python copied to clipboard

Connection timeout not logged or handled on client side

Open michaelosthege opened this issue 4 years ago • 5 comments

We are frequently running into disconnects that are not handled on the client side. Finally we were able to capture some relevant logging information:

On the client side, the activity just stops. I did not find any indication of why. Memory use for example remained stable.

2021-09-30T17:30:41.695Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Sending PINGREQ
2021-09-30T17:30:41.696Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Received PINGRESP
2021-09-30T17:31:41.759Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Sending PINGREQ
2021-09-30T17:31:41.760Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Received PINGRESP
2021-09-30T17:32:41.818Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Sending PINGREQ
2021-09-30T17:32:41.819Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Received PINGRESP
2021-09-30T17:33:41.880Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Sending PINGREQ
2021-09-30T17:33:41.881Z	/usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404	Received PINGRESP

The mosquitto-broker log shows a line indicating that the client timed out 1.5 minutes (clocks may have been slightly out of sync) after that last ping. No further activity from that client shows up in the logs.

2021-09-30 17:35:12 1633023312: Client IBT1016-RODI-client-C55EYC has exceeded timeout, disconnecting.

The entire time the main process of the client was in a while loop doing time.sleep(1), as it was waiting for a certain message to be received.

The MQTT client itself is connected like this, running the loop on the second thread:

self.client = mqtt.Client(client_id=client_id)
self.client.connect(broker)
self.client.on_connect = self._on_connect
self.client.on_disconnect = self._on_disconnect
self.client.on_message = self.on_message
self.client.loop_start()

The _on_disconnect method was never called.

paho-mqtt version is 1.5.1


Any ideas why the client timed out?

Is there anything I can do client side to reliably check if I'm still connected, if I can't trust the on_disconnect event?

michaelosthege avatar Oct 03 '21 20:10 michaelosthege

It happened again and this time I saw in the logs that the client did not send its PINGREQ before being terminated by the server.

Any ideas why it did that?

michaelosthege avatar Oct 17 '21 12:10 michaelosthege

@michaelosthege I'm still thinking through how to deal with this, but the loop_start() thread can silently die. When it does, many of the self.client methods still appear to work, but nothing will be sent or received. Lack of PINGREQ is one of the symptoms that I've seen when this happens.

I recommend checking self.client._thread and self.client._thread.is_alive() when you have this problem to see if that's the problem.

BertKleewein avatar Oct 19 '21 19:10 BertKleewein

@BertKleewein thanks for the reply. Yesterday I rolled out an update where I'm calling the enable_logging() before the loop_start(). My hope was that it gives us logs from the second thread. Is there anything I can do to revive the thread? Does it hurt to call loop_start() even if the thread is running already? Even better of course would be to fix it at the source..

michaelosthege avatar Oct 19 '21 19:10 michaelosthege

@michaelosthege - calling loop_start() won't hurt, but it probably won't help either. I've tried some easy tricks to revive the thread, but none of them worked. This was a few months ago, before I was pulled off to work on something else. Now I have some time to look at this in earnest, so hopefully I'll come up with something. Fixing it at the source is the only reasonable fix, IMHO.

BertKleewein avatar Oct 19 '21 20:10 BertKleewein

There have been quite a few changes since this issue was raised (and more in the upcoming release). As such may I ask you to retest with the latest release and let us know if the issue still exists?

Note: This is part of an exercise to clean up old issues so that the project can move forwards.

MattBrittan avatar Jan 08 '24 03:01 MattBrittan