paho.mqtt.python
paho.mqtt.python copied to clipboard
Connection timeout not logged or handled on client side
We are frequently running into disconnects that are not handled on the client side. Finally we were able to capture some relevant logging information:
On the client side, the activity just stops. I did not find any indication of why. Memory use for example remained stable.
2021-09-30T17:30:41.695Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Sending PINGREQ
2021-09-30T17:30:41.696Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Received PINGRESP
2021-09-30T17:31:41.759Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Sending PINGREQ
2021-09-30T17:31:41.760Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Received PINGRESP
2021-09-30T17:32:41.818Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Sending PINGREQ
2021-09-30T17:32:41.819Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Received PINGRESP
2021-09-30T17:33:41.880Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Sending PINGREQ
2021-09-30T17:33:41.881Z /usr/local/lib/python3.8/site-packages/paho/mqtt/client.py:2404 Received PINGRESP
The mosquitto-broker log shows a line indicating that the client timed out 1.5 minutes (clocks may have been slightly out of sync) after that last ping. No further activity from that client shows up in the logs.
2021-09-30 17:35:12 1633023312: Client IBT1016-RODI-client-C55EYC has exceeded timeout, disconnecting.
The entire time the main process of the client was in a while loop doing time.sleep(1), as it was waiting for a certain message to be received.
The MQTT client itself is connected like this, running the loop on the second thread:
self.client = mqtt.Client(client_id=client_id)
self.client.connect(broker)
self.client.on_connect = self._on_connect
self.client.on_disconnect = self._on_disconnect
self.client.on_message = self.on_message
self.client.loop_start()
The _on_disconnect method was never called.
paho-mqtt version is 1.5.1
Any ideas why the client timed out?
Is there anything I can do client side to reliably check if I'm still connected, if I can't trust the on_disconnect event?
It happened again and this time I saw in the logs that the client did not send its PINGREQ before being terminated by the server.
Any ideas why it did that?
@michaelosthege
I'm still thinking through how to deal with this, but the loop_start() thread can silently die. When it does, many of the self.client methods still appear to work, but nothing will be sent or received. Lack of PINGREQ is one of the symptoms that I've seen when this happens.
I recommend checking self.client._thread and self.client._thread.is_alive() when you have this problem to see if that's the problem.
@BertKleewein thanks for the reply. Yesterday I rolled out an update where I'm calling the enable_logging() before the loop_start(). My hope was that it gives us logs from the second thread. Is there anything I can do to revive the thread? Does it hurt to call loop_start() even if the thread is running already? Even better of course would be to fix it at the source..
@michaelosthege - calling loop_start() won't hurt, but it probably won't help either. I've tried some easy tricks to revive the thread, but none of them worked. This was a few months ago, before I was pulled off to work on something else. Now I have some time to look at this in earnest, so hopefully I'll come up with something. Fixing it at the source is the only reasonable fix, IMHO.
There have been quite a few changes since this issue was raised (and more in the upcoming release). As such may I ask you to retest with the latest release and let us know if the issue still exists?
Note: This is part of an exercise to clean up old issues so that the project can move forwards.