[BUG] Gateway v3.7.5 on Windows crashes and disconnects after a successful Modbus data read
Describe the bug
The ThingsBoard Gateway (v3.7.5) running on Windows connects successfully to both a Modbus TCP PLC and the ThingsBoard Cloud server. The gateway successfully polls the PLC and receives the correct data values. However, immediately after the first successful data poll, the gateway's connection to the ThingsBoard MQTT broker is severed. The gateway then enters a continuous loop of reconnecting and disconnecting every few seconds.
This issue appears to be a fatal crash/bug in the gateway's core logic that occurs when it tries to process or publish the first valid data packet received from the Modbus connector.
Steps to Reproduce
Install ThingsBoard Gateway v3.7.5 on a Windows machine using pip. Configure the main thingsboard.json (or .yaml) file to connect to a ThingsBoard server (e.g., ThingsBoard Cloud) using an access token. Configure a Modbus connector to poll a valid timeseries register from a working Modbus TCP device. Ensure the gateway is configured to read at least one telemetry value that is known to be correct. Start the gateway. Expected behavior The gateway should connect to the PLC, read the data, successfully publish the telemetry to ThingsBoard, and remain connected, continuing to poll at the specified interval.
Actual behavior
The gateway starts, connects to the PLC, and connects to the ThingsBoard server. The logs confirm a successful Modbus read. Immediately after this first successful read, the MQTT connection drops with reason code None. The gateway then enters a connect-disconnect loop, making it unusable.
A representative log snippet of the loop:
...
|INFO| - [slave.py] - slave - connect - 209 - Connected to [Your PLC Name]
...
|INFO| - [tb_client.py] - tb_client - _on_connect - 313 - MQTT client connected to platform [Your TB Host]
...
// Data is successfully read from PLC at this point
...
|WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _on_disconnect - 568 - MQTT client was disconnected with reason code None (Description not found.)
...
// Reconnect attempts begin
Environment
OS: Windows ThingsBoard IoT Gateway version: 3.7.5 Python version: 3.13 Installation method: pip install thingsboard-gateway Connector: Modbus TCP ThingsBoard platform: ThingsBoard Cloud (mqtt.eu.thingsboard.cloud)
Troubleshooting Steps Performed (to rule out common issues)
This issue was troubleshot extensively. The following potential causes were eliminated:
- Local/Network Firewall: The issue persists even with the local Windows Firewall completely disabled. The PLC Isolation Test (using an invalid IP) proved the gateway stays connected to ThingsBoard when not talking to the PLC, ruling out a general network block.
- Access Token Conflict: The access token was regenerated, with no change in behavior.
- Keep-Alive/Timeout: The disconnect is instant (<1 second after data read), not a timeout. keepaliveSeconds was configured with no effect.
- Invalid Modbus Address/Data: The user confirmed that the gateway is receiving the correct numerical data from the PLC before it crashes. The issue persists even when simplifying the read from a 32float to a 16uint.
- reportStrategy: The crash occurs both with and without the reportStrategy configured in the main config file.
The logical conclusion is that a bug exists in the gateway's core code that causes a crash when handling a valid, successfully polled data packet from the Modbus connector.
Hi @dechambrier,
Thank you for your interest in ThingsBoard IoT Gateway.
Could you please confirm whether you used the “restart” service RPC on the gateway? We were only able to reproduce this behavior after invoking that RPC. It appears to be caused by the execv implementation in CPython on Windows, which spawns a new process instead of replacing the current one.
Your confirmation will help us determine whether we’ve correctly identified the root cause.
Hi @imbeacon I have not used the RPC service no, but I have moved from the Python implementation to the containerised implementation and I observe the same behaviour. I added more telemetry to import to see if the problem would be coming from the PLC's memory, or message size.
Also for info:
- I'm using TB UI to add devices.
- The PLC is a Wago 750-881
I get the data, so it's not critical, but it doesn't feel clean! I'm going to perform the install the install on another computer today, we'll see if the same behaviour happens.
Here are some logs.
2025-06-20 05:56:12.012 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398972001, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:12.915 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __read_data_from_storage - 1474 - Telemetry dp count: ANONYMIZED_COUNT_1 and attributes dp count: 0. Counting took: 0 milliseconds.
2025-06-20 05:56:12.975 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _on_disconnect - 568 - MQTT client was disconnected with reason code None (Description not found.)
2025-06-20 05:56:12.979 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _wait_for_rate_limit_released - 928 - Waiting for connection to be established before sending data to ThingsBoard!
2025-06-20 05:56:13.004 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398973003, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:14.015 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398974005, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False}), TelemetryEntry(ts=1750398974006, metadata={}, values={DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:15.017 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398975008, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:16.019 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398976010, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0}), TelemetryEntry(ts=1750398976011, metadata={}, values={DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:17.022 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398977013, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:18.024 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398978016, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:18.179 - |INFO| - [tb_client.py] - tb_client - _on_connect - 313 - MQTT client connected to platform mqtt.eu.thingsboard.cloud: 8883
2025-06-20 05:56:18.179 - |INFO| - [tb_device_mqtt.py] - tb_device_mqtt - _on_connect - 577 - MQTT client <paho.mqtt.client.Client object at ANONYMIZED_MEM_ADDRESS> - Connected!
2025-06-20 05:56:18.180 - |INFO| - [tb_gateway_mqtt.py] - tb_gateway_mqtt - gw_subscribe_to_attribute - 269 - Subscribed to *|* with id ANONYMIZED_ID_2 for device *
2025-06-20 05:56:18.245 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _on_disconnect - 568 - MQTT client was disconnected with reason code None (Description not found.)
2025-06-20 05:56:18.249 - |WARNING| - [tb_device_mqtt.py] - tb_device_mqtt - _wait_for_rate_limit_released - 928 - Waiting for connection to be established before sending data to ThingsBoard!
2025-06-20 05:56:19.025 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398979018, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX}), TelemetryEntry(ts=1750398979019, metadata={}, values={DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:20.030 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398980021, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:21.032 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398981023, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
2025-06-20 05:56:22.034 - |DEBUG| - [tb_gateway_service.py] - tb_gateway_service - __process_event - 1169 - Data from Wago Connector connector was sent to storage: [ConvertedData(deviceName=PLC_Device_A, deviceType=PLC, telemetry=[TelemetryEntry(ts=1750398982026, metadata={}, values={DatapointKey(key=rSensorLevelA, report_strategy=None): 18XX.XX, DatapointKey(key=rSensorLevelB, report_strategy=None): 16XX.XX, DatapointKey(key=bPumpStatusA, report_strategy=None): False, DatapointKey(key=bPumpStatusB, report_strategy=None): False, DatapointKey(key=rSensorPh, report_strategy=None): 0.0, DatapointKey(key=rSensorTemperature, report_strategy=None): 0.0, DatapointKey(key=rTankLevelC, report_strategy=None): 9X.XX, DatapointKey(key=rTankLevelD, report_strategy=None): 25X.XX, DatapointKey(key=rTankLevelE, report_strategy=None): 52X.XX})], attributes=Attributes(values={}), metadata={})]
Unfortunately at the moment we don't have solution for this issue on Windows. We will investigate this point and try to find the reason and provide a fix for it. Right now we can only suggest to build your own docker image from sources and use the gateway in the container, it will provide more control. To do this you need to do the following steps:
- Download sources and make changes if required.
- From the root of the repository run the following commands:
cp docker/Dockerfile .docker build -t tb-gateway . --load - Download default docker-compose.yml file from launch command window (entities menu -> gateways menu -> click on console icon for the gateway):
- Change image in docker-compose.yml from
image: thingsboard/tb-gateway:3.7-stabletoimage: tb-gateway - Run docker compose using command
docker compose up -d
Thanks!
For info, I'm in docker now, and observe the same behaviour.
@dechambrier
Are you sure that you don't have the gateway in the background, with the same credentials, it is the main issue - to restart the gateway we uses execv method, python implementation on Linux-based systems allows to overwrite current process, but for Windows it uses create new process and old doesn't stop. To confirm - you can dump mqtt packages from your windows machine to the server using wireshark and if you see disconnect messages with reason code "Session taken over" - it means that some device is running with the same credentials. If you don't run the same gateway twice, it means that some process in background runs with the same credentials.
Also, you may see Python process that runs in background, if you stop it - the issue should gone.
The simplest way to stop process in background, if you cannot find it in task manager - restart the machine. and then run the gateway in the docker.
I systematically tested several hypotheses to find the source of the restart loop.
Network & VPN: Ruled Out. I analyzed your PC's routing table and proved that neither the network path nor Tailscale was the cause. Polling Speed: Ruled Out. I increased the polling interval to 5 and 10 seconds, but the disconnect loop persisted, proving it wasn't a timing issue. Server-Side Policies: Ruled Out. I checked the server-side logs in the ThingsBoard UI and confirmed the server was not actively terminating the connection. Execution Environment: Ruled Out. I confirmed the issue was identical when running on Windows directly and in Docker, proving it wasn't a container-specific problem. I do not have a gateway running in the background.
I have also completely removed an entity of telegraf that was connecting to the PLC to send data to influxDB.
The definitive tests were:
Disabling the Connector: When the "Wago Connector" was disabled in the UI, the gateway connection is completely stable. Emptying the timeseries List: When the connector was enabled but had no tags to poll, the connection also remained stable.
Conclusion of the Isolation Process:
The problem is triggered specifically and only when the Modbus connector executes a poll for timeseries data. The gateway's core service is stable, but the action of polling for data causes the entire application to perform a clean shutdown, resulting in the connection loop.
Do you still want me to try with wireshark?
@dechambrier,
Thank you for your investigation, now my theory sounds not so realistic, but to completely decline it and to find the reason - it will be nice to have the reason code, returned from the server in disconnect packet. Even if it is not a session taken over - it will help us to find the root cause. Also, if you see any publish requests to topic v1/gateway/disconnect - please check their payload, they may contain a reason code for device connected through the gateway disconnection.
Another option - is exceeding rate limits, to check it - please take a look in notification - does your device exceed any of rate limits or no?
@imbeacon I am hitting limits for the device, sorry I missed that, I was not coming back to the homepage much.
But I don't understand how it's possible, I'm on prototype plan and this is the query:
"name": "Modbus_PLC_Connector_1",
"id": "e0b1c2d3-f4e5-6789-abcd-1234567890ab",
"master": {
"slaves": [
{
"host": "192.168.X.Y",
"port": 502,
"method": "socket",
"unitId": 1,
"deviceName": "PLC_System_A",
"deviceType": "Industrial_Controller",
"timeout": 35,
"byteOrder": "BIG",
"wordOrder": "LITTLE",
"retries": true,
"retryOnEmpty": true,
"retryOnInvalid": true,
"pollPeriod": 10000,
"connectAttemptTimeMs": 500,
"connectAttemptCount": 5,
"waitAfterFailedAttemptsMs": 30000,
"type": "tcp",
"attributes": [],
"timeseries": [
{
"tag": "levelSensor_A",
"type": "32float",
"address": 12304,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "columnLevel_B",
"type": "32float",
"address": 12298,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "pumpEnable_C",
"type": "bits",
"address": 12291,
"objectsCount": 1,
"functionCode": 1,
"bitTargetType": "bool"
},
{
"tag": "pumpEnable_D",
"type": "bits",
"address": 12296,
"objectsCount": 1,
"functionCode": 1,
"bitTargetType": "bool"
},
{
"tag": "phValue_E",
"type": "32float",
"address": 12292,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "temperatureSensor_F",
"type": "32float",
"address": 12288,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "storageLevel_G",
"type": "32float",
"address": 12384,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "tankLevel_H",
"type": "32float",
"address": 12352,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "productLevel_I",
"type": "32float",
"address": 12392,
"objectsCount": 2,
"functionCode": 3
},
{
"tag": "flowRate_J",
"type": "32float",
"address": 12310,
"objectsCount": 2,
"functionCode": 3
}
],
"attributeUpdates": [],
"rpc": []
}
]
}
}```
This behavior can occur when the gateway is restarted. Unfortunately, ThingsBoard currently does not provide an API to retrieve the number of available tokens for rate limits, which may lead to temporary rate limiting issues after a restart.
In most cases, the issue should stabilize after the gateway runs continuously for about 2 hours. If it does not, a workaround is to stop the gateway, wait for 2 hours (to allow the rate limit windows to reset), and then start it again. Since the maximum rate limit on thingsboard.cloud is typically enforced over a 1-hour window, this approach should help avoid further triggering of the limits.
We are aware of this situation and have already reported it to the ThingsBoard product team. Unfortunately, there is no immediate solution available, but it is on their radar for future improvements.
for info, I have disconnected the connector for 3+ hours but it went straight back to on/off mode !
I mean stop the gateway, not just connector, it is an issue with connection between the gateway and platform with rate limits, unfortunately stopping of the connector won't help, because this doesn't interrupt connection between the gateway and the platform.
I have 2 running, should I stop both? Or is it gateway specific ?
Thanks!