uptime-kuma icon indicating copy to clipboard operation
uptime-kuma copied to clipboard

Distributed mode

Open mabed-fr opened this issue 3 years ago • 9 comments

⚠️ Please verify that this feature request has NOT been suggested before.

  • [X] I checked and didn't find similar feature request

🏷️ Feature Request Type

New Monitor

🔖 Feature description

Is it possible to have several instances of uptime-kuma controlled by a central point? In distributed mode? Connected by wirguard ?

Regards,

✔️ Solution

Is it possible to have several instances of uptime-kuma controlled by a central point? In distributed mode? Connected by wirguard ?

❓ Alternatives

No response

📝 Additional Context

Congratulations for this project that I will support if one of my skills can help you.

mabed-fr avatar Feb 04 '22 23:02 mabed-fr

I like the idea of a "distributed mode" or HA mode (high availability mode), multi instance mode, multi hosts mode, fail safe mode, etc. (a few keywords so that this ticket can be found easily).

But what is "wirguard"? If you mean wireguard: You don't need a VPN tunnel to achieve something like that. Additional instances could be added via a private token (similar to how nodes are added to a Kubernetes cluster).

mamiu avatar Feb 08 '22 04:02 mamiu

A distributed install definitely makes sense for something that monitors uptime for other software. Would not want it to go down along with the other apps.

adyanth avatar Mar 05 '22 20:03 adyanth

I would love this as a feature, if I could have a small instance running on a site and relaying to a master instance somewhere.

For example, say you are an MSP, and you have a few line of business applications you want to monitor inside the network of reach customer without exposing the endpoints directly or vpns. Then the local instance relays or reports the stats to a central instance. Each client site may have and internal status page, but the MSP could have those status pages published centrally for all sites and customers.

jptechnical avatar May 26 '22 06:05 jptechnical

It's kind of strange that an application to monitor other applications wouldn't support running in high availability but maybe that's not part of the scope of this project. Uptime Kuma would need to support a external DB for data and something like redis for session cache. Also I'm not aware if uptime kuma writes anything else to disk but if so that would be to be changed as well to run HA.

onedr0p avatar Aug 07 '22 23:08 onedr0p

The project is brand new compared to what is on the market, it takes time to develop.

the main idea for my part was to have satellites in several countries but the HA is also possible.

If you want this functionality do not hesitate to comment.

mabed-fr avatar Aug 12 '22 04:08 mabed-fr

Yes!

officiallymarky avatar Sep 06 '22 17:09 officiallymarky

We just started using uptime-kuma and it's awesome! Thank you so much for creating this and making it available!

Like many others in this thread, the thought naturally arose of "who will watch the watchers"? A distributed/high-availability configuration would be the Bee's Knees.

Until then we're thinking about having uptime-kuma monitored by BetterUptime or healthchecks.io, which given that it's a single service should fit in the free tier.

snth avatar Nov 10 '22 12:11 snth

This would be awesome! and if it would be possible to make the nodes agree certain instance is down and then send the notification

MaxamServices avatar Dec 12 '22 17:12 MaxamServices

It would be great! And if possible, better create a config that allow the notification to be sent "if 2/3 of the depoloyment detail downtime".

I just got a case yesterday that the kuma non stop sending notification (timeout every minute), but when I access the application (which host on AWS and has CloudWatch), it is completly fine. I guess there is some routing issue in between. Only 2 out of 50 application monitored by kuma has such issue.....but then it keep me awaking since 5am in the morning....

cheuklam avatar Dec 18 '22 02:12 cheuklam

I agree, it would be great

wokawoka avatar Mar 23 '23 00:03 wokawoka

Just going to link #84 here as it looks similar

Computroniks avatar Mar 23 '23 07:03 Computroniks

it would be great. I have 1 server and 1 nas. If i can install 2 uptime in HA it will be awesome !

simcmoi avatar Oct 21 '23 21:10 simcmoi

Distributed across avaliable zone maybe a difficult task, but I think we can do it in a simple way. My request for distributed mode is of 3 reasons:

  1. current uptime kuma (UK) node is down, it will think all my monitoring site was down and up again when the UK service back online, which didn't looks good; It is not the site down but the UK service down
  2. We wanna use UK becoz we wanna ensure every service is up and running, and we will have emergency plan for such cases. When the service is down itself, our alert is gone. We can do HA / MultiAZ avaliable for the website but not for the monitoring service, which is a bit weird I would said
  3. Network issue which makes the site down in part of the world. Sometimes due to CDN service of network operator, the site maybe avaliable at Europe but stopped working at US. Personally I run some lowest level VM on cloud in different region (using free tier) to check such cases.

We can fix the above issue running multiple instance on different server, but the data is not united. That's why I am thinking of the following suggestion, which should be very simple to implement and fix all the above issue:

  • Multi instance data sync Steps a) Allow an instance name for each instance b) add one more column to the report table, besides the status changes, mark down which instance its from c) In the notification channel, add one more option which is "Uptime Kuma", so we can sent the status changes to other UK instance. d) When the service is up, check with other "allied instance" and migrate the missing data if there is any. (Not very important but good to have)

  • Client only deployment a tiny nodejs / python piece of code, that will ask the primary UK instance for the list to check, and return the result. We can run this piece of code on Lamda / Function based cloud service or docker, so we can just deploy in a very low cost / no cost to address Issue 3) I mentioned above

P.S. HA mode sounds fancy but it is hard to do HA across multiAZ without a lot of virtual IP, SDWAN which involved a lot of Infra thing. I think the method I mentioned above can minimze the dependency on network infra yet fixed the issue I listed. HA setup only means to keep the servie up and running, I dont think we need ot make things too complicated as DB cluster and heartbeat service together will already be more complicated then the whole project. I like UK for the simplicity yet achieve the purpose.

cheuklam avatar Oct 22 '23 03:10 cheuklam

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

officiallymarky avatar Oct 22 '23 03:10 officiallymarky

It's really not difficult, all the commercial services do this. You have multiple agents that report back, and only when x number agents fail do you report a failure.

This is not difficult but also not HA, once the main service is down, all client have no where to report. But as I mentioned in solution point 2, it did solved some other issue.

cheuklam avatar Oct 22 '23 03:10 cheuklam

I would also really like this feature because I just had my node with Uptime on it go down the other day and while most of my things don't require HA, it would be good to have that in a monitoring solution.

I don't know much high availability setups or Uptime's internal architecture but can't you push the difficult distributed consensus problem into some other component? For example whatever your underlying storage layer is, for things like Redis, Postgres, SQLite, ... there are usually already high availability solutions available so can't you perhaps leverage that?

snth avatar Oct 23 '23 15:10 snth

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?


It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

snth avatar Nov 10 '23 14:11 snth

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage

Actually, v2 does support (external+internal) mariadb next to sqlite and therefore also more complex setups like mariadb-galera see the progress here: https://github.com/louislam/uptime-kuma/milestone/24

For Postgres as a data backend see https://github.com/louislam/uptime-kuma/issues/959

CommanderStorm avatar Nov 10 '23 14:11 CommanderStorm

Thanks @CommanderStorm . That's great to hear.

Where can I read more about the sqlite setup? Is the connection string for that configurable because then I could probably just use Dqlite for the backend. That would be great because I would really like to avoid the GlusterFS route if possible.

snth avatar Nov 10 '23 14:11 snth

I don't know what you need. The sqlite database is stored at db/kuma.db. SQLite does not really have a connection string I know of... you just point at the file and go..

We have never looked into if dqlite is a possibility or if this should be a thing we should support (currently, I would argue that mariadb is enough, but I am not a maintainer) => currently not officially supported => we won't consider changes to this part of the system breaking

Here is our contribution guide https://github.com/louislam/uptime-kuma/blob/5b6522a54edad9737fccf195f9eaa25c6fb9d0f6/CONTRIBUTING.md

CommanderStorm avatar Nov 10 '23 15:11 CommanderStorm

I thought about this again and I think it might really not be that difficult, at least a basic High Availability mode that would be sufficient for my purposes.

Since uptime-kuma already comes with at docker-compose.yml file, my HA setup would be:

Since GlusterFS says it's fully POSIX compliant that should work fine. If a node goes down, Docker Swarm should redeploy uptime on another node and the data backend should be available there thanks to GlusterFS.

WDYT?

It would be nicer to have a storage backend like HA Postgres or CockroachDB but since uptime-kuma currently only seems to support file system storage, this will have to do.

Unless it is located geographically on a different Internet connection it really doesn’t improve the situation much.

officiallymarky avatar Nov 10 '23 22:11 officiallymarky

Hello,

Any news on that ?

babytof avatar Mar 18 '24 10:03 babytof

There have not been any news in the last four months. We are still working out the kinks of V2.0

CommanderStorm avatar Mar 18 '24 11:03 CommanderStorm

I would love to see this feature. It would be great if multiple nodes of Uptime Kuma can be linked. And that for each check you add there is an option to select which nodes this check should run on. And also use it as a fail condition. As in "report if all fail", "report if N fails". A syncing of tasks would be better, because this way each node can keep running in standalone mode if another is down. Which makes it kind of a distributed network of individual instances that can work standalone as well as cooperate, rather than for example workers that still depends on a master to be online.

This way I would add Uptime Kuma on many of my geographically separated servers and simply make sure my checks work on all of them, without having to configure many different individual instances.

JaneX8 avatar May 05 '24 19:05 JaneX8

@JaneX8 You can subscribe to https://github.com/louislam/uptime-kuma/issues/84 for updates. Currently, our priorities are on different items such as #4500 and refactoring the monitoring items for better maintainability.

CommanderStorm avatar May 05 '24 21:05 CommanderStorm