thingsboard-gateway icon indicating copy to clipboard operation
thingsboard-gateway copied to clipboard

[BUG] tb_device_mqtt stuck waiting for paho to process messages

Open tGallimberti-Kerberos opened this issue 10 months ago • 4 comments

Describe the bug tb_device_mqtt stuck waiting for paho to process messages Checking traffic logs seem that TB Edge PE some times can miss to reply with PUBACK and the message queue of paho start to grow after the limit of 5 packet. After the 600 seconds default timeout the tb_client retry to publish messages to paho and the situation stay stucked.

If the situation persist for too long TB Edge send a disconnect request and IOT-Gateway do not try to reconnect giving this eroor:

2025-02-18 16:34:08.474 - |DEBUG| - [client.py] - client - _easy_log - 3258 - Sending PUBLISH (d0, q1, r0, m120), 'b'v1/gateway/telemetry'', properties=None, ... (85 bytes)      
2025-02-18 16:34:08.478 - |ERROR| - [tb_device_mqtt.py] - tb_device_mqtt - get - 159 - Error while waiting for publish: Message publish failed: The client is not currently connected.

Image

The IOT-Gateway have IP 192.168.0.80, the TB Edge PE have IP 192.168.0.158 the screen was taken from Wireshark with the folowing filter

(ip.addr == 192.168.0.158 || ip.addr == 192.168.0.80) && mqtt

Error traceback (If available):

2025-02-18 16:30:13.454 - |DEBUG| - [tb_device_mqtt.py] - tb_device_mqtt - _wait_until_current_queued_messages_processed - 803 - Waiting for messages to be processed by paho client, current queue size - 12, max inflight messages: 5
2025-02-18 16:30:13.455 - |DEBUG| - [tb_device_mqtt.py] - tb_device_mqtt - _wait_until_current_queued_messages_processed - 803 - Waiting for messages to be processed by paho client, current queue size - 12, max inflight messages: 5
2025-02-18 16:30:13.455 - |DEBUG| - [tb_device_mqtt.py] - tb_device_mqtt - _wait_until_current_queued_messages_processed - 803 - Waiting for messages to be processed by paho client, current queue size - 12, max inflight messages: 5

Versions (please complete the following information):

  • OS: Windows 10 Pro 19045.5487
  • Thingsboard IoT Gateway version 3.7.0
  • Python version 3.12.4

tGallimberti-Kerberos avatar Feb 18 '25 15:02 tGallimberti-Kerberos

Hi @tGallimberti-Kerberos,

Thank you for your interest in the ThingsBoard IoT Gateway and for your investigation.

You are correct—this is a known issue. Unfortunately, we don’t have a proper solution at the moment. Removing this check would lead to a memory leak, and eventually, when the queue contains messages with all possible identifier values (65535), the client would no longer be able to send new messages.

We are currently exploring potential solutions to address this issue. If you have any suggestions, feel free to share them.

imbeacon avatar Feb 19 '25 05:02 imbeacon

Checking how things work inside it seem paho library some times miss the PUBACK from the MQTT broker, or broker doesn't respond to published messages, For some reason that I'm investigating paho do not retry to send MQTT messages with QOS1, but simply push them to it's internal queue. Did you ever opened this issue to paho?

tGallimberti-Kerberos avatar Feb 19 '25 08:02 tGallimberti-Kerberos

Just found the problem inside paho.mqtt.python library. Sended a pull request

tGallimberti-Kerberos avatar Feb 19 '25 16:02 tGallimberti-Kerberos

Hi @tGallimberti-Kerberos,

You are right, there is an issue in the paho library, we temporarily made a mirror of paho library, until it will be fixed in paho - tb-paho-mqtt-client. Also, I suggest to add using of lock to your fix to avoid issue with changing dict during iteration, like the following:

with self._out_message_mutex:
    return self._update_inflight()

instead of

return self._update_inflight()

imbeacon avatar Apr 07 '25 06:04 imbeacon