redash icon indicating copy to clipboard operation
redash copied to clipboard

Lost connection to PostgreSQL database

Open jekeam opened this issue 1 year ago • 10 comments
trafficstars

Issue Summary

After launching Redash, after some time the connection to the database is lost.

If you go into connection testing, the request just hangs and does not return an answer.

There can be no problem with the database, because... I checked this and after restarting the service, the database connects again and works for some time

Steps to Reproduce

Right after launch: Screenshot_73

After 15 minutes: Screenshot_74

Also, the table schema, pg 12 version, is never loaded: image

I don't see any errors in the logs.

Technical details:

  • Redash Version: 10.1.0
  • Browser/OS: Ubuntu 20.04.6 LTS
  • How did you install Redash: setup.sh

jekeam avatar Jul 16 '24 14:07 jekeam

Interesting. Is this a brand new Redash installation that you could recreate using the latest preview image instead?

justinclift avatar Jul 16 '24 15:07 justinclift

Interesting. Is this a brand new Redash installation that you could recreate using the latest preview image instead?

Yes, everything is the same on the preview version.

image

jekeam avatar Jul 16 '24 15:07 jekeam

Could this be something related to the characteristics of the system (I am using a pure VPS server)? Where can I see logs for this problem?

jekeam avatar Jul 16 '24 15:07 jekeam

Yeah, it sounds like something on the VPS setup itself is dropping the connection.

Which hosting company is the VPS with?

justinclift avatar Jul 16 '24 16:07 justinclift

Where can I see logs for this problem?

Good question. My first thought is it would probably be one of the worker docker containers showing the problem. It's also weird (to me) that it's not automatically re-establishing the dropped PostgreSQL connection(s) either. :frowning:

justinclift avatar Jul 16 '24 16:07 justinclift

VPS

VPS - reg.ru Number of vCPU cores 6 RAM memory size, GB Up to 48 GB available when selecting more vCPU cores 6 Disk capacity, GB 60

I have now discovered an error in the worker, probably this is the problem:

worker-1 | [2024-07-16 16:34:16,109][PID:419][INFO][rq.worker] periodic: Job OK (305c0cae0a196ae96915fd2b6f81001c435aad65)
worker-1 | [2024-07-16 16:34:16,110][PID:419][INFO][rq.worker] Result is kept for 600 seconds
worker-1 | [2024-07-16 16:34:16,121][PID:8][INFO][rq.worker] periodic: 59a84668e68687338646f735965213c58e814b32
worker-1 | [2024-07-16 16: 34: 16,154] [PID: 420] [RQ.Worker] [Job 59A846868733864646464646F735965213C58E8142]: Excection Raidion While Execution (Redash.TASS. queries.maintenance.remove_ghost_locks)
worker-1 | Traceback (most recent call last):
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
worker-1 | rv = job.perform()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
worker-1 | self._result = self._execute()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
worker-1 | result = self.func(*self.args, **self.kwargs)
worker-1 | File "/app/redash/tasks/queries/maintenance.py", line 148, in remove_ghost_locks
worker-1 | jobs = list(rq_job_ids())
worker-1 | File "/app/redash/monitor.py", line 64, in rq_job_ids
worker-1 | started_jobs = [StartedJobRegistry(queue=q).get_job_ids() for q in queues]
worker-1 | File "/app/redash/monitor.py", line 64, in <listcomp>
worker-1 | started_jobs = [StartedJobRegistry(queue=q).get_job_ids() for q in queues]
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/registry.py", line 152, in get_job_ids
worker-1 | self.cleanup()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/registry.py", line 234, in cleanup
worker-1 | job = self.job_class.fetch(job_id, connection=self.connection, serializer=self.serializer)
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 591, in fetch
worker-1 | job.refresh()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 991, in refresh
worker-1 | data = self.connection.hgetall(self.key)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/commands/core.py", line 4911, in hgetall
worker-1 | return self.execute_command("HGETALL", name)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1269, in execute_command
worker-1 | return conn.retry.call_with_retry(
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
worker-1 | return do()
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1270, in <lambda>
worker-1 | lambda: self._send_command_parse_response(
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1246, in _send_command_parse_response
worker-1 | return self.parse_response(conn, command_name, **options)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1286, in parse_response
worker-1 | response = connection.read_response()
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 882, in read_response
worker-1 | response = self._parser.read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 349, in read_response
worker-1 | result = self._read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 393, in _read_response
worker-1 | response = [
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 394, in <listcomp>
worker-1 | self._read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 401, in _read_response
worker-1 | response = self.encoder.decode(response)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 130, in decode
worker-1 | value = value.decode(self.encoding, self.encoding_errors)
worker-1 | UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

jekeam avatar Jul 16 '24 16:07 jekeam

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Yeah, that seems weird. :frowning:

justinclift avatar Jul 16 '24 17:07 justinclift

Excection Raidion

That's a weird error message too. Haven't seen that before either.

justinclift avatar Jul 16 '24 17:07 justinclift

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Yeah, that seems weird. 😦

As far as I understand, this error is not related to my attempt to connect to the database

jekeam avatar Jul 16 '24 17:07 jekeam

Here is the log after I press connection test

worker-1  | [2024-07-16 17:44:57,772][PID:7][INFO][rq.worker] default: ba60d510-d236-4ed0-9ad1-eb0695431875
worker-1  | [2024-07-16 17:44:57,832][PID:316][INFO][rq.worker] default: Job OK (ba60d510-d236-4ed0-9ad1-eb0695431875)
worker-1  | [2024-07-16 17:44:57,832][PID:316][INFO][rq.worker] Result is kept for 500 seconds
worker-1  | [2024-07-16 17:44:59,163][PID:7][INFO][rq.worker] default: e25614f4-7cb8-4ee3-843f-1e06856d5db4
worker-1  | [2024-07-16 17:44:59,233][PID:317][INFO][rq.worker] default: Job OK (e25614f4-7cb8-4ee3-843f-1e06856d5db4)
worker-1  | [2024-07-16 17:44:59,234][PID:317][INFO][rq.worker] Result is kept for 500 seconds
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] Received TICK_60 event from supervisor
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] Performing `RQ Worker Healthcheck` check for process name worker-0
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] WorkerHealthcheck: Worker rq:worker:f2ca6d09926b45d4b88ca2708f114801 healthcheck: Is busy? False. Seen lately? True (1 seconds ago). Has nothing to do? True (0 jobs in watched queues). ==> Is healthy? True
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] `RQ Worker Healthcheck` check succeeded for process worker-0
worker-1  | RESULT 2
worker-1  | OKREADY
worker-1  | [2024-07-16 17:45:08,380][PID:7][INFO][rq.worker] periodic: 305c0cae0a196ae96915fd2b6f81001c435aad65
worker-1  | [2024-07-16 17:45:08,394][PID:318][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.refresh_queries job.id=305c0cae0a196ae96915fd2b6f81001c435aad65 Refreshing queries...
worker-1  | [2024-07-16 17:45:08,462][PID:318][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.refresh_queries job.id=305c0cae0a196ae96915fd2b6f81001c435aad65 Done refreshing queries: {'started_at': 1721151908.3942385, 'outdated_queries_count': 0, 'last_refresh_at': 1721151908.4607894, 'query_ids': '[]'}
worker-1  | [2024-07-16 17:45:08,465][PID:318][INFO][rq.worker] periodic: Job OK (305c0cae0a196ae96915fd2b6f81001c435aad65)
worker-1  | [2024-07-16 17:45:08,466][PID:318][INFO][rq.worker] Result is kept for 600 seconds
worker-1  | [2024-07-16 17:45:30,711][PID:7][INFO][rq.worker] default: d51faf22-0da7-427b-8628-ef1410d1f329
worker-1  | [2024-07-16 17:45:30,774][PID:319][INFO][rq.worker] default: Job OK (d51faf22-0da7-427b-8628-ef1410d1f329)
worker-1  | [2024-07-16 17:45:30,774][PID:319][INFO][rq.worker] Result is kept for 500 seconds

jekeam avatar Jul 16 '24 17:07 jekeam

Any status update about that?

MrCirca avatar Oct 19 '25 10:10 MrCirca

@MrCirca

How about using the latest version Redash 25.8 or lator, if you are not using that . And, it may help to investigate if you can provide error message you got, and docker worker container log.

yoshiokatsuneo avatar Oct 19 '25 15:10 yoshiokatsuneo