thingsboard-gateway [BUG] Storage: Database is locked

Describe the bug I'm using a pretty default config, with a SQLite storage configuration.

"storage": {
    "type": "sqlite",
    "read_records_count": 100,
    "max_records_count": 100000,
    "data_folder_path": "./data/",
    "max_file_count": 10,
    "max_read_records_count": 10,
    "max_records_per_file": 10000,
    "data_file_path": "/etc/thingsboard-gateway/data/data.db",
    "messages_ttl_check_in_hours": 1,
    "messages_ttl_in_days": 7,
    "ts": 1717605380034
  },

It works OK for a while, but sometimes I start getting this error:

python3[1034]: 2024-06-29 07:26:47 - |ERROR| - [database_connector.py] - database_connector - execute - 66 - database is locked
python3[1034]: Traceback (most recent call last):
python3[1034]:   File "/usr/lib/python3/dist-packages/thingsboard_gateway/storage/sqlite/database_connector.py", line 62, in execute
python3[1034]:     return self.connection.execute(*args)
python3[1034]: sqlite3.OperationalError: database is locked

And what's worse is that the devices that display this behavior start to loop on it in a high frequency. I suspect that it then tries to report the error to the TB server for incrementing the error counter our something like this. But as the frequency is super high, I start to get 10x data volume, eating out my API quota and triggering Rate Limits.

Connector name: Database connector

Error traceback:

python3[1034]: Traceback (most recent call last):
python3[1034]:   File "/usr/lib/python3/dist-packages/thingsboard_gateway/storage/sqlite/database_connector.py", line 62, in execute
python3[1034]:     return self.connection.execute(*args)
python3[1034]: sqlite3.OperationalError: database is locked

Versions:

OS: Ubuntu 22.04.4 LTS
Thingsboard IoT Gateway version 3.4.5
Python version: 3.7

Jun 29 '24 10:06 elgutierrez

Hi @elgutierrez, thanks for your interest in ThingsBoard IoT Gateway! Can you please provide your config files? Because for now, I can't reproduce your issue. Also, could you try using the latest version (3.5.1) and tell us about the result?

Jul 01 '24 07:07 samson0v

Hey @samson0v , thanks for your response. There's nothing fancy with the config

{
  "thingsboard": {
    "host": "thingsboard.cloud",
    "port": 1883,
    "remoteShell": true,
    "remoteConfiguration": true,
    "statistics": {
      "enable": true,
      "statsSendPeriodInSeconds": 3600
    },
    "deviceFiltering": {
      "enable": false,
      "filterFile": "list.json"
    },
    "maxPayloadSizeBytes": 1024,
    "minPackSendDelayMS": 200,
    "minPackSizeToSend": 500,
    "checkConnectorsConfigurationInSeconds": 60,
    "handleDeviceRenaming": true,
    "security": {
      "type": "accessToken",
      "accessToken": "xxxxxxxxxxxxxxxx"
    },
    "qos": 1,
    "checkingDeviceActivity": {
      "checkDeviceInactivity": false,
      "inactivityTimeoutSeconds": 200,
      "inactivityCheckPeriodSeconds": 500
    },
    "ts": 1717695736321
  },
  "storage": {
    "type": "sqlite",
    "read_records_count": 1000,
    "max_records_count": 10000,
    "data_folder_path": "./data/",
    "max_file_count": 10,
    "max_read_records_count": 10,
    "max_records_per_file": 10000,
    "data_file_path": "./data/data.db",
    "messages_ttl_check_in_hours": 1,
    "messages_ttl_in_days": 3,
    "ts": 1717695736321
  },
  "grpc": {
    "enabled": true,
    "serverPort": 9595,
    "keepaliveTimeMs": 10000,
    "keepaliveTimeoutMs": 5000,
    "keepalivePermitWithoutCalls": true,
    "maxPingsWithoutData": 0,
    "minTimeBetweenPingsMs": 10000,
    "minPingIntervalWithoutDataMs": 5000,
    "keepAliveTimeMs": 10000,
    "keepAliveTimeoutMs": 5000,
    "ts": 1703599972698
  },
  "connectors": [
    {
      "type": "bacnet",
      "name": "Bacnet",
      "configuration": "bacnet.json"
    }
  ]
}

It's hard to tell when the issue appears... it starts running OK for a while and then it errors out.

And even though that's a big problem, the worst part is that the gateway enters in a loop and start sending tons of messages, triggering the rate limits.

I can confirm that we had this issue also using 3.5.1

Jul 03 '24 17:07 elgutierrez

@elgutierrez as I can see from the commits history, you need to use the version from the master branch.

Jul 08 '24 10:07 samson0v

So it's not released yet? Do you have an ETA for the release?

Jul 08 '24 12:07 elgutierrez

@elgutierrez on this week

Jul 08 '24 12:07 samson0v

On another note, I realized that having an error in the gateway (could be this issue with the DB, or maybe something else), triggers a behavior where the gateway starts looping very fast, retrying the failed command and getting the same issue. In those cases, the API is bombarded with tens/hundreds of requests per second, leading to a rate limit issue. I suspect this is caused by the way the gateway reports errors. Is there a related open issue that you know of?

Jul 08 '24 14:07 elgutierrez

Hi @elgutierrez,

Yes, we know about situations like this and it relates to the implementation of base library, at the moment we are trying to find the solution for this issue. We can just suggest to disable remote logging, to avoid a lot of messages. At the same time please describe when did you encounter the issue?

Jul 10 '24 09:07 imbeacon

Remote logging is already disabled, it was the 1st thing I did. I guess it's more likely to be related with the error counters like Bacnet_ERRORS_COUNT, storage_ERRORS_COUNT, etc

Jul 18 '24 15:07 elgutierrez

I have adjusted the logger, to send errors count every 5 seconds instead of on each error message. You can try to use a version from the master branch.

Jul 30 '24 05:07 imbeacon

Hi @elgutierrez, any updates?

Aug 06 '24 06:08 samson0v

Hey, unfortunately I can't update my field devices with code from master branch, so I'm waiting for the new release so I can update it. Do we have an updated ETA?

Aug 19 '24 13:08 elgutierrez