invidious icon indicating copy to clipboard operation
invidious copied to clipboard

[Bug] (DB::PoolRetryAttemptsExceeded) -- self hosted using docker-compose.yaml config and containers

Open zestysoft opened this issue 2 years ago • 4 comments

Hey all, since this issue is closed, I'm creating a new issue in an attempt to further help to troubleshoot this.

Describe the bug Unhandled exception in spawn: (DB::PoolRetryAttemptsExceeded) from lib/db/src/db/query_methods.cr:46:7 in 'query:args' from lib/db/src/db/query_methods.cr:61:12 in 'begin' from src/invidious/jobs.cr:37:15 in '->' from /usr/share/crystal/src/fiber.cr:146:11 in 'run' from ???

Steps to Reproduce Start the latest invidious container

Logs Postgresql:

2023-12-30 09:00:50.534 UTC [1] LOG:  database system is ready to accept connections
2023-12-30 09:00:59.868 UTC [40] LOG:  connection received: host=[local]
2023-12-30 09:00:59.869 UTC [40] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:00:59.873 UTC [40] LOG:  disconnection: session time: 0:00:00.005 user=kemal database=invidious host=[local]
2023-12-30 09:01:09.877 UTC [47] LOG:  connection received: host=[local]
2023-12-30 09:01:09.877 UTC [47] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:01:09.879 UTC [47] LOG:  disconnection: session time: 0:00:00.002 user=kemal database=invidious host=[local]
2023-12-30 09:01:19.869 UTC [54] LOG:  connection received: host=[local]
2023-12-30 09:01:19.870 UTC [54] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:01:19.871 UTC [54] LOG:  disconnection: session time: 0:00:00.002 user=kemal database=invidious host=[local]
2023-12-30 09:01:29.868 UTC [62] LOG:  connection received: host=[local]
2023-12-30 09:01:29.869 UTC [62] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:01:29.870 UTC [62] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]
2023-12-30 09:01:39.865 UTC [69] LOG:  connection received: host=[local]
2023-12-30 09:01:39.866 UTC [69] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:01:39.867 UTC [69] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]
2023-12-30 09:01:46.953 UTC [70] LOG:  connection received: host=172.17.140.112 port=48126
2023-12-30 09:01:46.961 UTC [70] LOG:  connection authenticated: identity="kemal" method=scram-sha-256 (/var/lib/postgresql/data/pg_hba.conf:100)
2023-12-30 09:01:46.961 UTC [70] LOG:  connection authorized: user=kemal database=invidious application_name=crystal
2023-12-30 09:01:46.978 UTC [71] LOG:  connection received: host=172.17.140.112 port=48132
2023-12-30 09:01:46.979 UTC [72] LOG:  connection received: host=172.17.140.112 port=48140
2023-12-30 09:01:46.979 UTC [73] LOG:  connection received: host=172.17.140.112 port=48150
2023-12-30 09:01:46.981 UTC [74] LOG:  connection received: host=172.17.140.112 port=48160
2023-12-30 09:01:47.214 UTC [73] LOG:  connection authenticated: identity="kemal" method=scram-sha-256 (/var/lib/postgresql/data/pg_hba.conf:100)
2023-12-30 09:01:47.214 UTC [73] LOG:  connection authorized: user=kemal database=invidious application_name=crystal
2023-12-30 09:01:47.218 UTC [72] LOG:  connection authenticated: identity="kemal" method=scram-sha-256 (/var/lib/postgresql/data/pg_hba.conf:100)
2023-12-30 09:01:47.218 UTC [72] LOG:  connection authorized: user=kemal database=invidious application_name=crystal
2023-12-30 09:01:47.221 UTC [71] LOG:  connection authenticated: identity="kemal" method=scram-sha-256 (/var/lib/postgresql/data/pg_hba.conf:100)
2023-12-30 09:01:47.221 UTC [71] LOG:  connection authorized: user=kemal database=invidious application_name=crystal
2023-12-30 09:01:47.225 UTC [74] LOG:  connection authenticated: identity="kemal" method=scram-sha-256 (/var/lib/postgresql/data/pg_hba.conf:100)
2023-12-30 09:01:47.225 UTC [74] LOG:  connection authorized: user=kemal database=invidious application_name=crystal
2023-12-30 09:01:47.227 UTC [74] LOG:  disconnection: session time: 0:00:00.246 user=kemal database=invidious host=172.17.140.112 port=48160
2023-12-30 09:01:47.231 UTC [72] LOG:  disconnection: session time: 0:00:00.252 user=kemal database=invidious host=172.17.140.112 port=48140
2023-12-30 09:01:49.871 UTC [80] LOG:  connection received: host=[local]
2023-12-30 09:01:49.872 UTC [80] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:01:49.873 UTC [80] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]
2023-12-30 09:01:59.870 UTC [88] LOG:  connection received: host=[local]
2023-12-30 09:01:59.871 UTC [88] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:01:59.872 UTC [88] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]
2023-12-30 09:02:05.139 UTC [70] LOG:  disconnection: session time: 0:00:18.185 user=kemal database=invidious host=172.17.140.112 port=48126
2023-12-30 09:02:09.871 UTC [95] LOG:  connection received: host=[local]
2023-12-30 09:02:09.872 UTC [95] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:02:09.873 UTC [95] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]
2023-12-30 09:02:19.874 UTC [102] LOG:  connection received: host=[local]
2023-12-30 09:02:19.874 UTC [102] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:02:19.876 UTC [102] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]
2023-12-30 09:02:29.870 UTC [110] LOG:  connection received: host=[local]
2023-12-30 09:02:29.871 UTC [110] LOG:  connection authorized: user=kemal database=invidious application_name=pg_isready
2023-12-30 09:02:29.872 UTC [110] LOG:  disconnection: session time: 0:00:00.001 user=kemal database=invidious host=[local]

Additional context The last suggestion was to increase the max numbers of connections, which I did (100 to 300), however the problem persisted. So I cranked up the logging in postgresql, and it doesn't look like a connection exhaustion issue, or really any DB exhaustion issue.

From what I can see, the client makes 5 connections, two of which close in less than a second, the third closes in 18 seconds, and the remaining two connections persist.

This all happens right after startup. DB activity drops to almost nothing after around 18 seconds (right around when the third session terminates). At that point it's mostly the keepalives, and the occasional select statement. Nonetheless, This happens after:

2023-12-30 09:32:06 UTC [error] RefreshChannelsJob: UCDlfXUX_wYgyfiXbfQh96Eg :
2023-12-30 09:32:06 UTC [error] RefreshChannelsJob: UCDlfXUX_wYgyfiXbfQh96Eg fiber : backing off for 00:02:00s
Unhandled exception in spawn:  (DB::PoolRetryAttemptsExceeded)
  from lib/db/src/db/query_methods.cr:46:7 in 'query:args'
  from lib/db/src/db/query_methods.cr:61:12 in 'begin'
  from src/invidious/jobs.cr:37:15 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???
2023-12-30 09:34:07 UTC [error] RefreshChannelsJob: UC6107grRI4m0o2-emgoDnAA :
2023-12-30 09:34:07 UTC [error] RefreshChannelsJob: UC6107grRI4m0o2-emgoDnAA fiber : backing off for 00:04:00s
2023-12-30 09:38:09 UTC [error] RefreshChannelsJob: UCHnyfMqiRRG1u-2MsSQLbXA :
2023-12-30 09:38:09 UTC [error] RefreshChannelsJob: UCHnyfMqiRRG1u-2MsSQLbXA fiber : backing off for 00:08:00s
2023-12-30 09:46:10 UTC [error] RefreshChannelsJob: UC6Om9kAkl32dWlDSNlDS9Iw :
2023-12-30 09:46:10 UTC [error] RefreshChannelsJob: UC6Om9kAkl32dWlDSNlDS9Iw fiber : backing off for 00:16:00s
2023-12-30 10:01:47 UTC [info] jobs: running ClearExpiredItems job
2023-12-30 10:01:48 UTC [info] jobs: ClearExpiredItems failed. Retrying in 10 minutes.
2023-12-30 10:02:11 UTC [error] RefreshChannelsJob: UCJ6q9Ie29ajGqKApbLqfBOg :
2023-12-30 10:02:11 UTC [error] RefreshChannelsJob: UCJ6q9Ie29ajGqKApbLqfBOg fiber : backing off for 00:32:00s

At this point, there are no further calls from the client -- only the keepalives. While the system was in this state, I queried the db to see which sessions are still open and what state they're in:

-[ RECORD 1 ]-----+-----------------------------------------
Database          | invidious
Process ID        | 73
User Name         | kemal
Application       | crystal
Client Address    | 172.17.140.112
Client Hostname   |
Client Port       | 48150
Backend Start     | 2023-12-30 09:01:46.979732+00
Transaction Start |
Query Start       | 2023-12-30 09:01:47.225026+00
Last State Change | 2023-12-30 09:01:47.225166+00
State             | idle
Wait Event Type   | Client
Wait Event        | ClientRead
Current Query     | LISTEN "notifications"
-[ RECORD 2 ]-----+-----------------------------------------
Database          | invidious
Process ID        | 71
User Name         | kemal
Application       | crystal
Client Address    | 172.17.140.112
Client Hostname   |
Client Port       | 48132
Backend Start     | 2023-12-30 09:01:46.978656+00
Transaction Start |
Query Start       | 2023-12-30 09:32:05.13837+00
Last State Change | 2023-12-30 09:32:05.138605+00
State             | idle
Wait Event Type   | Client
Wait Event        | ClientRead
Current Query     | SELECT id FROM channels ORDER BY updated

Both are in an idle state waiting for the client to read the results.

zestysoft avatar Dec 30 '23 10:12 zestysoft

I made some adjustments to the code and re-compiled the docker image -- so far no errors -- I'm going to let things run for a while to see if this actually made a difference or not and if so, will submit a pr

zestysoft avatar Dec 30 '23 12:12 zestysoft

I made some adjustments to the code

What adjustments exactly? You might have better luck in avoiding this issue by using a connection pooler like pgbouncer.

unixfox avatar Dec 31 '23 11:12 unixfox

You might have better luck in avoiding this issue by using a connection pooler like pgbouncer.

After all of the evidence I posted here, why would you still think this is a connection issue?

zestysoft avatar Dec 31 '23 17:12 zestysoft

@unixfox Just wanted to ping ya that I still have yet to see these errors after applying that pr to my local copy.

zestysoft avatar Jan 06 '24 05:01 zestysoft