uptime-kuma icon indicating copy to clipboard operation
uptime-kuma copied to clipboard

Dashboard takes forever to load

Open babecassis opened this issue 2 years ago • 16 comments

⚠️ Please verify that this bug has NOT been raised before.

  • [X] I checked and didn't find similar issue

🛡️ Security Policy

Description

Loading the dashboard sometimes loads with no monitored entities. When this happens I see. Happens often. Average load on my system is

Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26) at runNextTicks (internal/process/task_queues.js:60:5) at listOnTimeout (internal/timers.js:526:9) at processTimers (internal/timers.js:500:7) at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28) at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19) at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:570:22) at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:556:22) at async Function.calcUptime (/app/server/model/monitor.js:590:22) at async Function.sendUptime (/app/server/model/monitor.js:650:24) { sql: '\n' + ' SELECT\n' + ' -- SUM all duration, also trim off the beat out of time window\n' + ' SUM(\n' + ' CASE\n' + ' WHEN (JULIANDAY(time) - JULIANDAY(?)) * 86400 < duration\n' + ' THEN (JULIANDAY(time) - JULIANDAY(?)) * 86400\n' + ' ELSE duration\n' + ' END\n' + ' ) AS total_duration,\n' + '\n' + ' -- SUM all uptime duration, also trim off the beat out of time window\n' + ' SUM(\n' + ' CASE\n' + ' WHEN (status = 1)\n' + ' THEN\n' + ' CASE\n' + ' WHEN (JULIANDAY(time) - JULIANDAY(?)) * 86400 < duration\n' + ' THEN (JULIANDAY(time) - JULIANDAY(?)) * 86400\n' + ' ELSE duration\n' + ' END\n' + ' END\n' + ' ) AS uptime_duration\n' + ' FROM heartbeat\n' + ' WHERE time > ?\n' + ' AND monitor_id = ?\n' + ' ', bindings: [ '2022-02-15 19:32:42', '2022-02-15 19:32:42', '2022-02-15 19:32:42', '2022-02-15 19:32:42', '2022-02-15 19:32:42', 20 ] } at process.<anonymous> (/app/server/server.js:1553:13) at process.emit (events.js:400:28) at processPromiseRejections (internal/process/promises.js:245:33) at processTicksAndRejections (internal/process/task_queues.js:96:32) at runNextTicks (internal/process/task_queues.js:64:3) at listOnTimeout (internal/timers.js:526:9) at processTimers (internal/timers.js:500:7) If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues

👟 Reproduction steps

I do not know what triggers this

👀 Expected behavior

I see my monitored services

😓 Actual Behavior

Dashboard loads w/ no monitored services

🐻 Uptime-Kuma Version

1.12.1

💻 Operating System and Arch

Raspbian / Raspberry pi 3

🌐 Browser

MS Edge

🐋 Docker Version

20.10.12, build e91ed57

🟩 NodeJS Version

No response

📝 Relevant log output

No response

babecassis avatar Mar 21 '22 14:03 babecassis

It is usually related to read/write. Since you are using pi, make sure the sd card is fast and do not using a network drive as the volume.

louislam avatar Mar 22 '22 02:03 louislam

It is usually related to read/write. Since you are using pi, make sure the sd card is fast and do not using a network drive as the volume.

I can confirm this on DSM 6.2 docker, uptime version 1.15.1.

I had put the ./data folder on a HDD, when I got those "KnexTimeoutError", I noticed hight disk usage at that HDD (99% or 100% for quite long period). So I moved ./data folder to an SSD RAID 1 yesterday, then all those error was gone. The usage of this SSD raid is less than 5% while writes/read IOs at about 50/40.

FYI, I have setup 37 monitors in totall, 2 of them are https monitor, 1 income monitor, 1 DNS monitor, ping monitors for the rest. Most of these monitors are triggered every 1 minute.

halfu avatar May 18 '22 01:05 halfu

I'm seeing the same issue and error in my own installations and some PikaPods.com users have reported them.

Workaround is to limit history days (to 14 or 30 days) and clear the history. As others have noticed the issue starts at around 1 GB DB size.

It's not a network drive, but by default we only assign 0.25 CPU cores. That gives about 3 MB/s read/write speed from what I see. That may be too slow and maybe we need more cores for more history.

Screen Shot 2022-07-02 at 10 21 26

m3nu avatar Jul 02 '22 06:07 m3nu

Seeing the same on the latest docker image.

Aterfax avatar Jul 16 '22 00:07 Aterfax

I have the same issue on a Synology Diskstation.

kalpik avatar Nov 11 '22 09:11 kalpik

Had this happen to me, solved it by doing a backup / export of the config, deleting all the appdata then re-importing.

Aterfax avatar Nov 12 '22 16:11 Aterfax

re

Yes, but it keeps happening again and again after while.

kalpik avatar Nov 12 '22 16:11 kalpik

I'm guessing the solution would be to use a real database if one has "real" data. Or to limit the data, like I'm doing right now https://github.com/louislam/uptime-kuma/issues/1397#issuecomment-1172847138

m3nu avatar Nov 12 '22 17:11 m3nu

I'm guessing the solution would be to use a real database if one has "real" data. Or to limit the data, like I'm doing right now #1397 (comment)

Yep, but a real database isn't supported unfortunately. Even limiting data to 14 days is not enough.

kalpik avatar Nov 12 '22 17:11 kalpik

Running on a Docker host, Ubuntu 22.04.1 LTS, latest and greatest. Uptime Kuma has been becoming slower lately: opening the page shows 0 monitors and only after 20/30 secs it starts populating.

Server is almost idle. Uptime Kuma is backed by NVMe SSD on ZFS.

Database is indeed quite big:

image

hyperbart avatar Nov 16 '22 14:11 hyperbart

Hit the button Shrink, nothing seemed to happen. Hit the button clear all statistics Meanwhile went to NetData Monitoring, it shows Uptime Kuma was very busy, don't know if the busyness came from the shrink or the clear statistics.

image

Interface feels snappy now, lowered the days to keep monitoring data to 14 days.

hyperbart avatar Nov 16 '22 14:11 hyperbart

I have same problem. I have 600+ monitors on Uptime Kuma.

Immediately after starting Uptime Kuma, Dashboard appears immediately, but as time passes, the display slows down or does not appear at all. When the display slows down or does not display at all, the CPU is near 100%.

After checking Network in Chrome Developer Tools, it appears that when the Dashboard is open in the web browser, there is a huge amount of websocket traffic flowing through the web browser.

It also appears that all data is being read when the Dashboard is displayed (data from all monitors and events is being read, including monitors and events that are not displayed on the screen). I assume that this results in increased disk I/O and excessive load on the DB.

I think this problem could be solved by adding lazyload and pagination. Please consider this.

tabimoba avatar Nov 25 '22 16:11 tabimoba

I have same problem. I have 600+ monitors on Uptime Kuma.

Immediately after starting Uptime Kuma, Dashboard appears immediately, but as time passes, the display slows down or does not appear at all. When the display slows down or does not display at all, the CPU is near 100%.

After checking Network in Chrome Developer Tools, it appears that when the Dashboard is open in the web browser, there is a huge amount of websocket traffic flowing through the web browser.

It also appears that all data is being read when the Dashboard is displayed (data from all monitors and events is being read, including monitors and events that are not displayed on the screen). I assume that this results in increased disk I/O and excessive load on the DB.

I think this problem could be solved by adding lazyload and pagination. Please consider this.

The number of monitors is not the issue. The data layer (SQLite) just locks up after a while. I only have like 15 monitors, and that locks up SQLite as well. The only solution IMO is running a proper DB, which as I understand is difficult to do because of the way uptime Kuma is built. So at this point, I'm mostly looking for alternatives :(

kalpik avatar Nov 25 '22 17:11 kalpik

The only solution IMO is running a proper DB, which as I understand is difficult to do because of the way uptime Kuma is built

Why is it difficult to use another database, @louislam? I just looked through the code and in most places it already uses a ORM (Redbean) that also supports MySQL. Here an example how it's used.

The only place that's more SQLite-specific is the migrations here. I'm assuming Redbean can also change DB tables, so that's solvable.

Since we already host a few Uptime-Kuma instances on PikaPods and Louis hasn't claimed any revenue share, I'd like to offer a bounty of US$1500 to implement MySQL support for this project under the same open license. This is much more than we can afford in relation to what our users pay, but I see it as an investment, since the rest of Uptime Kuma is great.

m3nu avatar Dec 04 '22 12:12 m3nu

I'd like to offer a bounty of US$1500 to implement MySQL support for this project under the same open license.

personally, i would lean on postgresql support. but i think the most ideal and beneficial support is support for different database systems like postgresql, mysql, sqlite, etc.

JacksonChen666 avatar Dec 25 '22 11:12 JacksonChen666

I'd vote postgres just because it is already running on my server :)

christopherpickering avatar Dec 26 '22 16:12 christopherpickering

Please retest with latest UpTime Kuma release

Saibamen avatar Aug 17 '23 18:08 Saibamen

Please retest with latest UpTime Kuma release

For me the issue still exists in the latest version. Running 1.23.1 on Kubernetes and Longhorn backed by SSDs.

aessing avatar Sep 08 '23 19:09 aessing

@aessing your issue is possibly unrelated to the issue you are posting in. Longhorn uses iscaci NFS under the hood as I understad it. => uptime-kuma contains a database => you are running a database on a network share => possibly the added latency of reads/writes is killing the database performance and not https://github.com/louislam/uptime-kuma/pull/3515

Note that running on a NFS-Style system has soundness bugs with SQLite databases due to faulty file locking which may lead to corrupted databases. Please run uptime-kuma on a local volume instead. See https://github.com/louislam/uptime-kuma/wiki/%F0%9F%94%A7-How-to-Install#-docker and https://www.sqlite.org/howtocorrupt.html#_filesystems_with_broken_or_missing_lock_implementations

CommanderStorm avatar Sep 08 '23 20:09 CommanderStorm

Thanks @CommanderStorm for your quick response.

Indeed Longhorn uses iSCSI and NFS. In my case iSCSI, as the volume is Read Write Once. If you go for Many, Longhorn will use NFS.

Local unfortunately isn't possible in a multi node cluster. Otherwise HA/DR will not work.

The SQLLite DB is getting slower and slower as more data gets into the DB. Seems it could be a latency issue, but other DBs like MySQL and Influx works fine and fast from Longhorn.

aessing avatar Sep 08 '23 21:09 aessing

HA will not work with uptime-kuma. Please don't run multiple instances of the same docker container as this may corrupt the database.

V2 includes a version to connect to external databases (or continue with the embedded mariadb/sqlite)

In the meantime, choose a lower rentention to hide this issue.

CommanderStorm avatar Sep 08 '23 21:09 CommanderStorm

@CommanderStorm HA with K8S does not mean to run 2 containers - but when a node fails, another node will restart the container.

It seems we have to wait and see what v2 looks like. Is there an ETA, or preview?

aessing avatar Sep 08 '23 22:09 aessing

Is there an ETA

No, but you can watch the progress here https://github.com/louislam/uptime-kuma/milestone/24

or preview

We are not currently in beta ⇒ no public preview is available.

Some test tags can be found here https://hub.docker.com/r/louislam/uptime-kuma/tags, but be aware that they are by definition undefined behaviour and not kept up to date automatically ⇒ they can do anything, including making daemons fly out of your nose (read: don't create issues for things you find in them, as they are meant for internal testing ^^)

CommanderStorm avatar Sep 09 '23 00:09 CommanderStorm