pinchflat icon indicating copy to clipboard operation
pinchflat copied to clipboard

[Triage] Sources page times out before loading

Open gteltschik opened this issue 1 year ago • 13 comments

Describe the bug Loading the sources page times out. The error log shows:

10:39:06.164 [error] Exqlite.Connection (#PID<0.2343.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.6605.0> timed out because it queued and checked out the connection for longer than 15000ms

#PID<0.6605.0> was at location:

(exqlite 0.23.0) lib/exqlite/sqlite3.ex:137: Exqlite.Sqlite3.multi_step/3
(exqlite 0.23.0) lib/exqlite/sqlite3.ex:175: Exqlite.Sqlite3.try_fetch_all/3
(exqlite 0.23.0) lib/exqlite/sqlite3.ex:169: Exqlite.Sqlite3.fetch_all/3
(exqlite 0.23.0) lib/exqlite/connection.ex:675: Exqlite.Connection.get_rows/2
(exqlite 0.23.0) lib/exqlite/connection.ex:621: Exqlite.Connection.execute/4
(db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
(db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5

To Reproduce Steps to reproduce the behavior:

  1. Navigate to the URL/sources page
  2. The page will time out.

Expected behavior It should display the sources page.

Screenshots If applicable, add screenshots to help explain your problem.

N/A unless you like to see a page timeout in a browser.

Diagnostic info

  • App Version: 2025.1.27
  • yt-dlp Version: 2025.01.26
  • Apprise Version: v1.9.2
  • System Architecture: x86_64-pc-linux-gnu
  • Timezone: America/Chicago

Additional context

I think I may have found the limits of the application. I have 1 media profile, 560 sources, 26,747 downloads, 314 pending, and a total of 12.86TB downloaded. The database has over 1M items in it, and consumes 12GB of disk space (and is on a local NVMe). This behavior has been intermittent over the past month and unchanged with newer builds. It's difficult to tell if the error predated the change to the sources page (to paginate versus a single list) or not, but the timing is close. I seem to recall that the page return was slow previously, but usually succeeded whereas now it fails more often than it succeeds. The main home page is slow, but usually returns, other pages seem to return fine.

The server fluctuates between 3-20% CPU, runs stable at 50% memory (about 64GB used), and is not experiencing iowait issues. The container seems to consume 1GB of memory and reports 5-80% CPU. The beam.smp task frequently reports 100-110% CPU and 4GB of memory being used. Ergo, I expect the problems are relegated to the database.

pinchflat-logs-2025-01-29.txt

gteltschik avatar Jan 29 '25 17:01 gteltschik

Hey there! Thanks for the report (:

I think you're right! I have ~200k items locally which is slightly noticeably slower but not the end of the world. But I can imagine that around 1MM + would result in some annoying behaviours.

I should be able to get those numbers back down to earth. I've worked on SQLite projects with 200MM rows that still can perform complex queries in under 20ms, I just need to know where to start. If you'd be willing to share your database through some secure mechanism I'd love to have it as a test case, but otherwise I can try bumping my numbers locally and seeing if I can replicate this

kieraneglin avatar Jan 29 '25 19:01 kieraneglin

Happy to share my database. What's the best mechanism to do so (email)? I can easily send you a dropbox link, google, or whatever works best for you.

I was going to investigate tools to give it a cleaning to see if that might help.

gteltschik avatar Jan 29 '25 19:01 gteltschik

Email would be great! Soft preference for Google Drive but really anything works

kieraneglin avatar Jan 29 '25 19:01 kieraneglin

Invite to google drive sent. The files have finished uploading.

gteltschik avatar Jan 29 '25 20:01 gteltschik

Still working on this one - it's not the quick slam-dunk I had hoped so it's been a bit of a dive into the query planner

kieraneglin avatar Feb 05 '25 19:02 kieraneglin

Take your time, get it right. I'm guessing you may need to optimize the queries (the order or the scope) or even normalize the database structure to align with the queries. Not a sqlite expert (nor even a DBA), but let me know if I can assist.

gteltschik avatar Feb 10 '25 16:02 gteltschik

I often get timeout also, and have to wait a long time for a source page to load. However, I dont have a large amount of records.

Media Profile: 1 Sources: 68 Downloads: 1,136 Library Size: 421.69 GB

Example errors:

2025-04-21 18:42:51.551 [error] | Exqlite.Connection (#PID<0.2346.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.25916.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.25916.0> was at location:
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:137: Exqlite.Sqlite3.multi_step/3
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:175: Exqlite.Sqlite3.try_fetch_all/3
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:177: Exqlite.Sqlite3.try_fetch_all/3
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:169: Exqlite.Sqlite3.fetch_all/3
    (exqlite 0.23.0) lib/exqlite/connection.ex:675: Exqlite.Connection.get_rows/2
    (exqlite 0.23.0) lib/exqlite/connection.ex:621: Exqlite.Connection.execute/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
2025-04-21 18:44:04.084 [error] | GenServer {Oban.Registry, {Oban, Oban.Stager}} terminating
** (Exqlite.Error) Database busy
UPDATE "oban_jobs" AS o0 SET "state" = ? WHERE (o0."id" IN (?,?,?,?,?))
    (ecto_sql 3.12.0) lib/ecto/adapters/sql.ex:1078: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.12.0) lib/ecto/adapters/sql.ex:976: Ecto.Adapters.SQL.execute/6
    (oban 2.18.3) lib/oban/engines/lite.ex:132: Oban.Engines.Lite.stage_jobs/3
    (oban 2.18.3) lib/oban/engine.ex:244: anonymous fn/4 in Oban.Engine.stage_jobs/3
    (oban 2.18.3) lib/oban/engine.ex:339: anonymous fn/3 in Oban.Engine.with_span/4
    (telemetry 1.3.0) /app/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (oban 2.18.3) lib/oban/stager.ex:71: anonymous fn/1 in Oban.Stager.stage_and_notify/2
    (ecto_sql 3.12.0) lib/ecto/adapters/sql.ex:1382: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
ting
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 287ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
See DBConnection.start_link/2 for more information
    (ecto_sql 3.12.0) lib/ecto/adapters/sql.ex:1078: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.12.0) lib/ecto/adapters/sql.ex:976: Ecto.Adapters.SQL.execute/6
    (ecto 3.12.3) lib/ecto/repo/queryable.ex:232: Ecto.Repo.Queryable.execute/4
    (oban 2.18.3) lib/oban/engines/lite.ex:110: Oban.Engines.Lite.fetch_jobs/3
    (oban 2.18.3) lib/oban/engine.ex:234: anonymous fn/4 in Oban.Engine.fetch_jobs/3
    (oban 2.18.3) lib/oban/engine.ex:339: anonymous fn/3 in Oban.Engine.with_span/4
    (telemetry 1.3.0) /app/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (oban 2.18.3) lib/oban/queue/producer.ex:253: Oban.Queue.Producer.start_jobs/1
Last message: :dispatch
2025-04-21 18:42:14.717 [error] | Exqlite.Connection (#PID<0.2345.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.25907.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.25907.0> was at location:
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:137: Exqlite.Sqlite3.multi_step/3
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:175: Exqlite.Sqlite3.try_fetch_all/3
    (exqlite 0.23.0) lib/exqlite/sqlite3.ex:169: Exqlite.Sqlite3.fetch_all/3
    (exqlite 0.23.0) lib/exqlite/connection.ex:675: Exqlite.Connection.get_rows/2
    (exqlite 0.23.0) lib/exqlite/connection.ex:621: Exqlite.Connection.execute/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5
2025-04-21 18:42:14.718 [error] | Exqlite.Connection (#PID<0.2346.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.25913.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.25913.0> was at location:
2025-04-21 18:40:55.924 [error] | Exqlite.Connection (#PID<0.2344.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.25881.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.25881.0> was at location:
    (exqlite 0.23.0) lib/exqlite/connection.ex:622: Exqlite.Connection.execute/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:1653: DBConnection.run/6
    (db_connection 2.7.0) lib/db_connection.ex:829: DBConnection.execute/4
    (ecto_sqlite3 0.17.2) lib/ecto/adapters/sqlite3/connection.ex:90: Ecto.Adapters.SQLite3.Connection.query/4
    (ecto_sql 3.12.0) lib/ecto/adapters/sql.ex:1038: Ecto.Adapters.SQL.execute!/5
warning: conflicting layouts found. A layout has been set with format, such as:
    put_layout(conn, html: {PinchflatWeb.Layouts, :app})
But also without format:
    put_layout(conn, {PinchflatWeb.Layouts, :app})
In this case, the layout without format will always win.
Passing the layout without a format is currently soft-deprecated.
If you use layouts with formats, make sure that they are
used everywhere. Also remember to configure your controller
to use layouts with formats:
    use Phoenix.Controller, layouts: [html: {PinchflatWeb.Layouts, :app}]
  (phoenix 1.7.17) lib/phoenix/controller.ex:1036: Phoenix.Controller.assigns_layout/3
  (phoenix 1.7.17) lib/phoenix/controller.ex:1016: Phoenix.Controller.prepare_assigns/4
  (phoenix 1.7.17) lib/phoenix/controller.ex:973: Phoenix.Controller.render_and_send/4
  (phoenix 1.7.17) lib/phoenix/endpoint/render_errors.ex:86: Phoenix.Endpoint.RenderErrors.instrument_render_and_send/5
  (phoenix 1.7.17) lib/phoenix/endpoint/render_errors.ex:63: Phoenix.Endpoint.Ren

hozza avatar Apr 21 '25 18:04 hozza

I’m at 950ish sources now and still having the same behavior. I’ve contemplated moving my container to a faster server just to see if that changes anything.

gteltschik avatar Apr 21 '25 19:04 gteltschik

I’m at 950ish sources now and still having the same behavior. I’ve contemplated moving my container to a faster server just to see if that changes anything.

@gteltschik I recently upgraded from an older server (which had the same issues) and although not the beefyest server, it doesn't seem to make much of a difference. I've still got plenty of resources left and other more intensive applications running on ~SQlite without issue.

I did have most sources set to fast indexing (via RSS) and since slowing these down to daily/weekly/monthly it may have improved? I'm not entirely sure though.

I find that if it restart the container every few days, it seems to temporarily fix the issue (and starts downloading pendings at a faster rate also).

hozza avatar Apr 21 '25 19:04 hozza

I’m at 950ish sources now and still having the same behavior. I’ve contemplated moving my container to a faster server just to see if that changes anything.

@gteltschik I recently upgraded from an older server (which had the same issues) and although not the beefyest server, it doesn't seem to make much of a difference. I've still got plenty of resources left and other more intensive applications running on ~SQlite without issue.

I did have most sources set to fast indexing (via RSS) and since slowing these down to daily/weekly/monthly it may have improved? I'm not entirely sure though.

I find that if it restart the container every few days, it seems to temporarily fix the issue (and starts downloading pendings at a faster rate also).

Most of my sources are set to monthly, except about 10 on daily. I did that after I noticed several set to daily or weekly didn’t seem to ever be running. Once I went through and switched them, everything started downloading as expected.

My system isn’t pegged, but the overall system response is slower than an identical server; I can see it just opening a ssh session. Earlier I had pinchflat running on a beefier server, but then problem was then i/o wait times due to an external drive array. I have a new server to build, and I am thinking an experiment might be in order; it won’t have any of the previous limitations.

gteltschik avatar Apr 22 '25 01:04 gteltschik

An update that may or may not be helpful.

I moved my pinchflat instance to a new server based on an 8845HS processor which at least made the sources page load, albeit still slow. It was still timing out the sources page intermittently, and eventually even the main page would time out (the error log showed the database query timing out). On a hunch, I cleaned up the database:

sudo docker stop pinchflat cd pinchflat/config/db sqlite3 pinchflat.db 'VACUUM;' sudo docker start pinchflat

My database lost 6GB (from 15GB to 9GB) and everything is much more responsive. It's still slow, but nowhere near as glacial. I can now consistently bring up the sources page, even with 1,100 sources (and counting).

Might be worth introducing a monthly cleanup routine to keep the database optimized. Not sure what was causing the database to get so large. And yes, I do still have a copy of the pre-vacuumed database (or a version of it pre-move).

gteltschik avatar Jun 03 '25 15:06 gteltschik

I encounter the same problem. 19 Sources 82 Downloads 241 GB. Its not that much and I can see the logs disconnecting from the sqlite db

2025-06-20 08:22:47.001 request_id=GEqsCqq-_kq1X6EAAZqB [debug] | QUERY OK source="settings" db=0.1ms idle=447.0ms
SELECT s0."id", s0."onboarding", s0."pro_enabled", s0."yt_dlp_version", s0."apprise_version", s0."apprise_server", s0."youtube_api_key", s0."route_token", s0."extractor_sleep_interval_seconds", s0."download_throughput_limit", s0."restrict_filenames", s0."video_codec_preference", s0."audio_codec_preference" FROM "settings" AS s0 LIMIT 1 []
2025-06-20 08:22:47.001 request_id=GEqsCqq-_kq1X6EAAZqB [debug] | QUERY OK source="media_profiles" db=0.0ms idle=447.0ms
SELECT count(m0."id") FROM "media_profiles" AS m0 []
2025-06-20 08:22:47.002 request_id=GEqsCqq-_kq1X6EAAZqB [debug] | QUERY OK source="sources" db=0.1ms idle=442.8ms
SELECT count(s0."id") FROM "sources" AS s0 []
2025-06-20 08:22:59.553 [error] | Exqlite.Connection (#PID<0.2318.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.4432.0> timed out because it queued and checked out the connection for longer than 15000ms

#PID<0.4432.0> was at location:

    (exqlite 0.31.0) lib/exqlite/sqlite3.ex:227: Exqlite.Sqlite3.multi_step/3
    (exqlite 0.31.0) lib/exqlite/sqlite3.ex:265: Exqlite.Sqlite3.try_fetch_all/3
    (exqlite 0.31.0) lib/exqlite/sqlite3.ex:259: Exqlite.Sqlite3.fetch_all/3
    (exqlite 0.31.0) lib/exqlite/connection.ex:686: Exqlite.Connection.get_rows/2
    (exqlite 0.31.0) lib/exqlite/connection.ex:632: Exqlite.Connection.execute/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5

2025-06-20 08:23:02.002 [error] | Exqlite.Connection (#PID<0.2320.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.4459.0> timed out because it queued and checked out the connection for longer than 15000ms

#PID<0.4459.0> was at location:

    (exqlite 0.31.0) lib/exqlite/sqlite3.ex:227: Exqlite.Sqlite3.multi_step/3
    (exqlite 0.31.0) lib/exqlite/sqlite3.ex:265: Exqlite.Sqlite3.try_fetch_all/3
    (exqlite 0.31.0) lib/exqlite/sqlite3.ex:259: Exqlite.Sqlite3.fetch_all/3
    (exqlite 0.31.0) lib/exqlite/connection.ex:686: Exqlite.Connection.get_rows/2
    (exqlite 0.31.0) lib/exqlite/connection.ex:632: Exqlite.Connection.execute/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5

2025-06-20 08:23:20.206 [info] | {"source":"oban","duration":4084154,"event":"plugin:stop","plugin":"Oban.Plugins.Pruner","pruned_count":0}
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
2025-06-20 08:23:55.541 [info] | Checking permissions for /config
2025-06-20 08:23:55.556 [info] | Permissions OK
2025-06-20 08:23:55.556 [info] | Checking permissions for /downloads
2025-06-20 08:23:55.556 [info] | Permissions OK
2025-06-20 08:23:55.556 [info] | Checking permissions for /etc/yt-dlp
2025-06-20 08:23:55.556 [info] | Permissions OK
2025-06-20 08:23:55.556 [info] | Checking permissions for /etc/yt-dlp/plugins
2025-06-20 08:23:55.556 [info] | Permissions OK
2025-06-20 08:23:55.556 [info] | Checking permissions for /tmp/pinchflat/data
2025-06-20 08:23:55.556 [info] | Permissions OK
2025-06-20 08:23:55.556 [info] | Checking permissions for /config/extras
2025-06-20 08:23:55.557 [info] | Permissions OK
2025-06-20 08:23:55.557 [info] | Checking permissions for /config/metadata
2025-06-20 08:23:55.557 [info] | Permissions OK
2025-06-20 08:23:55.557 [info] | Checking permissions for /config/extras/elixir_tz_data
2025-06-20 08:23:55.557 [info] | Permissions OK
Setting umask to 022
2025-06-20 08:23:57.031 [debug] | Exqlite: loading extension `/app/lib/pinchflat-2025.6.6/priv/repo/extensions/sqlean-linux-x86/sqlean`
2025-06-20 08:23:57.041 [debug] | Exqlite: loading extension `/app/lib/pinchflat-2025.6.6/priv/repo/extensions/sqlean-linux-x86/sqlean`
2025-06-20 08:23:57.469 [info] | Migrations already up

Mozart409 avatar Jun 20 '25 06:06 Mozart409

An update that may or may not be helpful.

I moved my pinchflat instance to a new server based on an 8845HS processor which at least made the sources page load, albeit still slow. It was still timing out the sources page intermittently, and eventually even the main page would time out (the error log showed the database query timing out). On a hunch, I cleaned up the database:

sudo docker stop pinchflat cd pinchflat/config/db sqlite3 pinchflat.db 'VACUUM;' sudo docker start pinchflat

My database lost 6GB (from 15GB to 9GB) and everything is much more responsive. It's still slow, but nowhere near as glacial. I can now consistently bring up the sources page, even with 1,100 sources (and counting).

Might be worth introducing a monthly cleanup routine to keep the database optimized. Not sure what was causing the database to get so large. And yes, I do still have a copy of the pre-vacuumed database (or a version of it pre-move).

can I just say thank you - this sqlite3 VACCUM command was extremely beneficial for me - the pinchflat application was running very slow for such a long time, and actions like clicking on the sources page could result in the app taking ~30 seconds to load it. Immediately after I ran this, everything is back to loading almost instantly, and general performance is much much better

aaronpkelly avatar Jul 06 '25 10:07 aaronpkelly