paho.mqtt.python
paho.mqtt.python copied to clipboard
Client disconnect with code 16 and cannot connect again
I have this paho-mqtt python client (I have omitted some script logic and DB accessing for simplification):
import json
import logging
import time
from paho.mqtt import client as mqtt
logging.basicConfig(level=logging.INFO)
mqtt_logger = logging.getLogger("MqttCon")
def on_connect(client, userdata, flags, rc):
mqtt_logger.info("Device connected with result code: " + str(rc))
client.subscribe("some_topic/#", qos=0)
def on_disconnect(client, userdata, rc):
mqtt_logger.info("Device disconnected with result code: " + str(rc))
def pub_values(client, base_topic, values):
msg = json.dumps(values)
target_topic = base_topic
mqtt_logger.info("Sending message to wheel...")
client.publish(target_topic, msg, qos=0)
def on_message(client, userdata, msg):
"""
Input messages format (can be one or list):
{
"tag_id": "string",
"value": "int or float",
"timestamp": "string format YYYY-MM-DDTHH:MM:SS.F"
}
"""
try:
info = json.loads(str(msg.payload.decode("utf-8")))
mqtt_logger.info(msg.topic)
pub_msg = {"tag_ids": [], "timestamp": ""}
mqtt_logger.info(f"Received message topic is {msg.topic}")
if isinstance(info, list):
# some logic
else:
# some logic, writing values to InfluxDB
pub_values(client=client, base_topic="some_topic", values=pub_msg)
except Exception as e:
mqtt_logger.error(str(e))
def on_log(client, userdata, level, buf):
mqtt_logger.info(f"SYSTEM: {buf}")
if __name__ == "__main__":
client = mqtt.Client(protocol=mqtt.MQTTv311)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
client.on_message = on_message
client.on_log = on_log
client.connect_async('<my_mqtt_url>', port=1883)
client.loop_start()
try:
while True:
time.sleep(0)
except KeyboardInterrupt:
pass
finally:
mqtt_logger.warning("Stopping client")
client.loop_stop()
client.disconnect()
It works good for a few hours, but at some time it have disconnection:
INFO:MqttCon:Received message topic is telemetry/MESPIV/batch
INFO:MqttCon:Sending message to wheel...
INFO:MqttCon:SYSTEM: Sending PUBLISH (d0, q0, r0, m4228), 'b'router/newdata'', ... (11336 bytes)
INFO:MqttCon:Sending message to wheel...
INFO:MqttCon:SYSTEM: Sending PUBLISH (d0, q0, r0, m4229), 'b'router/newdata'', ... (11336 bytes)
INFO:MqttCon:Device disconnected with result code: 16
INFO:MqttCon:Device disconnected with result code: 16
INFO:MqttCon:SYSTEM: Sending CONNECT (u0, p0, wr0, wq0, wf0, c1, k60) client_id=b''
INFO:MqttCon:SYSTEM: Sending CONNECT (u0, p0, wr0, wq0, wf0, c1, k60) client_id=b''
# more Sending CONNECT
The client disconnects with code 16 (MQTT_ERR_KEEPALIVE) and tries to connect in an infinite loop, but it fails to connect (don't recive CONNACK). How can I solve this problem? For mqtt docker container i use eclipse-mosquitto image.
My first code version worked by usial connect, but it has the same problem.
if __name__ == "__main__":
# setting mqtt client
client.connect('<my_mqtt_url>', port=1883)
client.loop_forever()
client.disconnect()
An important addition: in the client logs there are disconnections with the code 16 with a successful reconnection, but it is impossible to reconnect from some kind.
Could it be linked to a momentary peak of messages? as described in #683 ?
No i tested this case. I launched mqtt client which publish messages in infinite loop without time.sleep and this didn't drop service.
Kludge solution:
- Becouse MQTT client works in Docker container, i modify on_disconnect in such way:
def on_disconnect(client, userdata, rc):
mqtt_logger.info("Device disconnected with result code: " + str(rc))
if rc == 16:
exit(1)
- Add restart: always to container. If there is dissconnect with code 16 -> application exits and container drops -> restarting container and it works for few hours.
At the time of restarting the container, there is a period of time during which data loss occurs from the mqtt broker.
Exit and restart... that's not really an option for me, I need to process all messages. @ralight, @PierreF any idea what could cause this? or any workaround?
Similar (maybe same) problem here, after a first disconnection with rc=16 the client connect (at least enter the on_connect callback) and subscribe to the topics again. After that it immediately disconnect with rc=7 and keep trying to connect.
At the broker side I can see that the connection and subscription are successful, I don't know why the disconnection occurs after that. I am using the iotcore broker from AWS.
The script is running in 50 computers and the problem happens once every 2 days in one of the computers. It seems to be happening more where the internet connection is not very stable. I have already added a queue to limit less than 50 publish requests per second for a single client - a broker limit that triggers a disconnection.
@rodolphoiwano sometimes i think about rewriting code on Golang Maybe it's python library specific problem.
@Flock1993 after some time I found a situation that exceeded the 50 publish requests per second for the AWS iotcore broker. It happened after a long disconnection that triggered the keepalive timeout.
Fixing it and using clean session always, may have solved the problem. It seems that after a server forced disconnection, it is not possible to reconnect to the same session.
However, the problem has never happened here when using a local broker (mosquitto), so I am considering it a problem related to the AWS iotcore broker limitations.
in the client logs there are disconnections with the code 16 with a successful reconnection
Can you please share more of the logs (ideally from the last successful publish through to the second code 16 error). Can you please also confirm that you can duplicate this with the code provided (changing only <my_mqtt_url>
). I have a few ideas but need more info to confirm.
MattBrittan,
- To duplicate problem i need to add some another code, which writes data from mqtt messages to InfluxDB. I will do some experiments on this week and return to you with results and logs
- Also, problem described above is accompanied by high CPU consumption, which you can see on attached message. Disconnection with code 16 occurs at the peak of CPU consumption, after that Docker container restart - we see low CPU consumption. When I rewrote the code on golang the problem is gone.
To duplicate problem i need to add some another code,
Thanks; I understand this can be difficult to provide, but unfortunately it's also difficult to trace an issue when you cannot duplicate it (and as there are not that many people working on issues in this repo the more you can do to help us replicate the issue the more likely it is that we will be able to solve it).
at the peak of CPU consumption,
It would be interesting to know what is causing the CPU use (i.e. is this high message volume, something to do with the number of messages written to the database, something else..). These issues can be difficult to track down (the log info I requested may help) and it's not uncommon for the issue to be in user code (rather than the library). I've put a lot of effort into tracking down a few of these issues in the Go client so happy to hear that's working for you!
Exit and restart... that's not really an option for me, I need to process all messages. @ralight, @PierreF any idea what could cause this? or any workaround?
what we do is that we trigger a reconnect in the on_disconnect hook, however, that still leaves you up to lose some messages, if they come in very fast. This is a limitation of the MQTT Protocol, AFAIK there is no Clean way in the MQTT world to prevent this from happening.
@Flock1993 on you initial logs, there is one thing that look really odd:
INFO:MqttCon:Device disconnected with result code: 16
INFO:MqttCon:Device disconnected with result code: 16
INFO:MqttCon:SYSTEM: Sending CONNECT (u0, p0, wr0, wq0, wf0, c1, k60) client_id=b''
INFO:MqttCon:SYSTEM: Sending CONNECT (u0, p0, wr0, wq0, wf0, c1, k60) client_id=b''
You really had 2 disconnect and 2 connect consecutively ? Could you add timestamp on those log to known whether they happen at the same time ?
Having two consecutive CONNECT might be possible during auto-reconnection. A first connection is trying and fail (at TCP level), it then retry later. Two consecutive disconnect seems like a bug. Is it possible that two network loop are running ?
If you can still reproduce this issue, I suggest you trying to do with the 2.0.0rc2 because their is some improvement on few network error handling and there is improved warning if you try to run two network loop (it's a pre-release, but available on PyPI).
@PierreF Maybe as bit of notice. I'm also struggling with the rc 16 disconnect. But interestingly similar as to @Flock1993 I see the also 2x the on_disconnect being called.
2024-07-02T13:25:34+0000 [WARNING] Unexpected source MQTT disconnection. Will auto-reconnect
2024-07-02T13:25:34+0000 [WARNING] Disconnection code 16: No matching subscribers
2024-07-02T13:25:34+0000 [WARNING] Unexpected source MQTT disconnection. Will auto-reconnect
2024-07-02T13:25:34+0000 [WARNING] Disconnection code 16: No matching subscribers
2024-07-02T13:25:35+0000 [INFO] Source broker Connection Established
2024-07-02T13:27:35+0000 [WARNING] Unexpected source MQTT disconnection. Will auto-reconnect
2024-07-02T13:27:35+0000 [WARNING] Disconnection code 16: No matching subscribers
2024-07-02T13:27:35+0000 [WARNING] Unexpected source MQTT disconnection. Will auto-reconnect
2024-07-02T13:27:35+0000 [WARNING] Disconnection code 16: No matching subscribers
2024-07-02T13:27:36+0000 [INFO] Source broker Connection Established
2024-07-02T13:30:38+0000 [WARNING] Unexpected source MQTT disconnection. Will auto-reconnect
2024-07-02T13:30:38+0000 [WARNING] Disconnection code 16: No matching subscribers
2024-07-02T13:30:38+0000 [WARNING] Unexpected source MQTT disconnection. Will auto-reconnect
2024-07-02T13:30:38+0000 [WARNING] Disconnection code 16: No matching subscribers
note: this was on 1.6.1, but similar result in 2.1.0. Note2: https://github.com/marcelrv/victron-dbus-mqttmapper is what is running. It has 2 instances but connecting to 2 different mqtt servers.