redash
redash copied to clipboard
Lost connection to PostgreSQL database
Issue Summary
After launching Redash, after some time the connection to the database is lost.
If you go into connection testing, the request just hangs and does not return an answer.
There can be no problem with the database, because... I checked this and after restarting the service, the database connects again and works for some time
Steps to Reproduce
Right after launch:
After 15 minutes:
Also, the table schema, pg 12 version, is never loaded:
I don't see any errors in the logs.
Technical details:
- Redash Version: 10.1.0
- Browser/OS: Ubuntu 20.04.6 LTS
- How did you install Redash: setup.sh
Interesting. Is this a brand new Redash installation that you could recreate using the latest preview image instead?
Interesting. Is this a brand new Redash installation that you could recreate using the latest
previewimage instead?
Yes, everything is the same on the preview version.
Could this be something related to the characteristics of the system (I am using a pure VPS server)? Where can I see logs for this problem?
Yeah, it sounds like something on the VPS setup itself is dropping the connection.
Which hosting company is the VPS with?
Where can I see logs for this problem?
Good question. My first thought is it would probably be one of the worker docker containers showing the problem. It's also weird (to me) that it's not automatically re-establishing the dropped PostgreSQL connection(s) either. :frowning:
VPS
VPS - reg.ru Number of vCPU cores 6 RAM memory size, GB Up to 48 GB available when selecting more vCPU cores 6 Disk capacity, GB 60
I have now discovered an error in the worker, probably this is the problem:
worker-1 | [2024-07-16 16:34:16,109][PID:419][INFO][rq.worker] periodic: Job OK (305c0cae0a196ae96915fd2b6f81001c435aad65)
worker-1 | [2024-07-16 16:34:16,110][PID:419][INFO][rq.worker] Result is kept for 600 seconds
worker-1 | [2024-07-16 16:34:16,121][PID:8][INFO][rq.worker] periodic: 59a84668e68687338646f735965213c58e814b32
worker-1 | [2024-07-16 16: 34: 16,154] [PID: 420] [RQ.Worker] [Job 59A846868733864646464646F735965213C58E8142]: Excection Raidion While Execution (Redash.TASS. queries.maintenance.remove_ghost_locks)
worker-1 | Traceback (most recent call last):
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
worker-1 | rv = job.perform()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
worker-1 | self._result = self._execute()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
worker-1 | result = self.func(*self.args, **self.kwargs)
worker-1 | File "/app/redash/tasks/queries/maintenance.py", line 148, in remove_ghost_locks
worker-1 | jobs = list(rq_job_ids())
worker-1 | File "/app/redash/monitor.py", line 64, in rq_job_ids
worker-1 | started_jobs = [StartedJobRegistry(queue=q).get_job_ids() for q in queues]
worker-1 | File "/app/redash/monitor.py", line 64, in <listcomp>
worker-1 | started_jobs = [StartedJobRegistry(queue=q).get_job_ids() for q in queues]
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/registry.py", line 152, in get_job_ids
worker-1 | self.cleanup()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/registry.py", line 234, in cleanup
worker-1 | job = self.job_class.fetch(job_id, connection=self.connection, serializer=self.serializer)
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 591, in fetch
worker-1 | job.refresh()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 991, in refresh
worker-1 | data = self.connection.hgetall(self.key)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/commands/core.py", line 4911, in hgetall
worker-1 | return self.execute_command("HGETALL", name)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1269, in execute_command
worker-1 | return conn.retry.call_with_retry(
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
worker-1 | return do()
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1270, in <lambda>
worker-1 | lambda: self._send_command_parse_response(
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1246, in _send_command_parse_response
worker-1 | return self.parse_response(conn, command_name, **options)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1286, in parse_response
worker-1 | response = connection.read_response()
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 882, in read_response
worker-1 | response = self._parser.read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 349, in read_response
worker-1 | result = self._read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 393, in _read_response
worker-1 | response = [
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 394, in <listcomp>
worker-1 | self._read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 401, in _read_response
worker-1 | response = self.encoder.decode(response)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 130, in decode
worker-1 | value = value.decode(self.encoding, self.encoding_errors)
worker-1 | UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Yeah, that seems weird. :frowning:
Excection Raidion
That's a weird error message too. Haven't seen that before either.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Yeah, that seems weird. 😦
As far as I understand, this error is not related to my attempt to connect to the database
Here is the log after I press connection test
worker-1 | [2024-07-16 17:44:57,772][PID:7][INFO][rq.worker] default: ba60d510-d236-4ed0-9ad1-eb0695431875
worker-1 | [2024-07-16 17:44:57,832][PID:316][INFO][rq.worker] default: Job OK (ba60d510-d236-4ed0-9ad1-eb0695431875)
worker-1 | [2024-07-16 17:44:57,832][PID:316][INFO][rq.worker] Result is kept for 500 seconds
worker-1 | [2024-07-16 17:44:59,163][PID:7][INFO][rq.worker] default: e25614f4-7cb8-4ee3-843f-1e06856d5db4
worker-1 | [2024-07-16 17:44:59,233][PID:317][INFO][rq.worker] default: Job OK (e25614f4-7cb8-4ee3-843f-1e06856d5db4)
worker-1 | [2024-07-16 17:44:59,234][PID:317][INFO][rq.worker] Result is kept for 500 seconds
worker-1 | 2024/07/16 17:45:01 [worker_healthcheck] Received TICK_60 event from supervisor
worker-1 | 2024/07/16 17:45:01 [worker_healthcheck] Performing `RQ Worker Healthcheck` check for process name worker-0
worker-1 | 2024/07/16 17:45:01 [worker_healthcheck] WorkerHealthcheck: Worker rq:worker:f2ca6d09926b45d4b88ca2708f114801 healthcheck: Is busy? False. Seen lately? True (1 seconds ago). Has nothing to do? True (0 jobs in watched queues). ==> Is healthy? True
worker-1 | 2024/07/16 17:45:01 [worker_healthcheck] `RQ Worker Healthcheck` check succeeded for process worker-0
worker-1 | RESULT 2
worker-1 | OKREADY
worker-1 | [2024-07-16 17:45:08,380][PID:7][INFO][rq.worker] periodic: 305c0cae0a196ae96915fd2b6f81001c435aad65
worker-1 | [2024-07-16 17:45:08,394][PID:318][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.refresh_queries job.id=305c0cae0a196ae96915fd2b6f81001c435aad65 Refreshing queries...
worker-1 | [2024-07-16 17:45:08,462][PID:318][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.refresh_queries job.id=305c0cae0a196ae96915fd2b6f81001c435aad65 Done refreshing queries: {'started_at': 1721151908.3942385, 'outdated_queries_count': 0, 'last_refresh_at': 1721151908.4607894, 'query_ids': '[]'}
worker-1 | [2024-07-16 17:45:08,465][PID:318][INFO][rq.worker] periodic: Job OK (305c0cae0a196ae96915fd2b6f81001c435aad65)
worker-1 | [2024-07-16 17:45:08,466][PID:318][INFO][rq.worker] Result is kept for 600 seconds
worker-1 | [2024-07-16 17:45:30,711][PID:7][INFO][rq.worker] default: d51faf22-0da7-427b-8628-ef1410d1f329
worker-1 | [2024-07-16 17:45:30,774][PID:319][INFO][rq.worker] default: Job OK (d51faf22-0da7-427b-8628-ef1410d1f329)
worker-1 | [2024-07-16 17:45:30,774][PID:319][INFO][rq.worker] Result is kept for 500 seconds
Any status update about that?
@MrCirca
How about using the latest version Redash 25.8 or lator, if you are not using that . And, it may help to investigate if you can provide error message you got, and docker worker container log.