uptime-kuma icon indicating copy to clipboard operation
uptime-kuma copied to clipboard

Alert for high server `network`/`ram`/`disk`/`cpu` usage "heartbeat" monitoring

Open OryonMax opened this issue 2 years ago • 45 comments

🏷️ Feature Request Type

New Monitor

🔖 Feature description

Please add Heartbeat Monitoring just like in HetrixTools.

✔️ Solution

Add a new monitor type which shows server's network usage, ram usage, disk and cpu usage and gives alert when usage is close to 90% so people know it's time to upgrade or add a new node.

❓ Alternatives

HetrixTools

📝 Additional Context

No response

👀 Have you spent some time to check if this feature request has been raised before?

  • [X] I checked and didn't find similar feature request

OryonMax avatar Oct 28 '21 07:10 OryonMax

I feel this is out of scope of UK (for now at-least). For remote servers, if the usage stats are that important, something more tuned to metrics tracking (Prometheus/grafana etc.) should likely be employed.

(You could look into making a push monitor and writing a script yourself like this if you really need this feature for something).

deefdragon avatar Oct 28 '21 09:10 deefdragon

Everybody needs that nowadays and most Status Pages are Paid, I hope to see this feature in Uptime Kuma.

OryonMax avatar Oct 28 '21 11:10 OryonMax

As @deefdragon said, this might be out out of scope for what UK does. However, using push and command line utilities like "mpstat" and "free" and if the UK developers allow users to change the units and description of whats monitored as I asked for in #749, then UK could give you individual graphs of cpu and memory utilization, only thing is you won't have one page with all the metrics together.

markdesilva avatar Oct 28 '21 15:10 markdesilva

Not allowed in UK?

OryonMax avatar Oct 28 '21 16:10 OryonMax

Didn't say its not allowed, but as mentioned, not in the scope of what UK does, So even if the developer decides to put it in, probably it will not be a priority. Unless you want to code that portion yourself and then make a pull request for your work to be included in a future release.

markdesilva avatar Oct 28 '21 16:10 markdesilva

@OrefaSol to provide some insight on why this is not really in scope.

A typical "status" service does nothing other than Ping a server. basically "Hey you there?" and it responds or doesn't. yes or no. This ping does not provide data on CPU, RAM, Storage, anything. all you get is yes I'm alive, and it took this long.

Due to this Uptime Kuma is a permissionless service, meaning I don't need to approve uptime Kuma talking to my website or any website really.

If Louis were to try and implement something like this it would require a separate program/script that would go on the website you want to test and send extra info over to Uptime Kuma. So it requires direct access to every service you want to test and get this data on.

HetrixTools offers various services, one being a status service, and another being a server monitoring service.

Uptime Kuma (In its current form at least) is strictly a status service.

ImmaZoni avatar Oct 28 '21 18:10 ImmaZoni

Nope, HetrixTool has Hearbeat Monitoring Under Uptime Monitoring Product.

OryonMax avatar Oct 31 '21 12:10 OryonMax

Sounds like maybe you should be using HetrixTools then as you’re obviously a fan.

markdesilva avatar Oct 31 '21 12:10 markdesilva

Nope, HetrixTool has Hearbeat Monitoring Under Uptime Monitoring Product.

What you are asking here is out of scope (perhaps for now), full stop.

Unless you are willing to code it yourself, wait until the UK developer does it.

Surely you understand that everyone who develops and contributes to a free and open-source product dedicates their free time to do so. The feature that you are asking for is from a paid product, there is a reason why it's paid, the money goes to a developer for their hard work.

rihards-simanovics avatar Oct 31 '21 15:10 rihards-simanovics

Everyone relax🐻.

Just follow one rule. If you love the suggestion, give a 👍.

Ignore it if you don't like it.

louislam avatar Oct 31 '21 15:10 louislam

I use https://www.netdata.cloud/

zimbres avatar Dec 31 '21 04:12 zimbres

@zimbres, this is amazing and Open Source. Hmm, I think I might have this running for my client reports... I will keep using the UK for internal services as those don't require reports generated. Thanks for sharing the tool name!

PS: Perhaps the UK might use some of the source code or take inspiration from that tool as it looks quite nice.

rihards-simanovics avatar Dec 31 '21 05:12 rihards-simanovics

This should work like hetrixtools works, that you can get also stats about ram, CPU, disk, network, etc. displayed in graph on status page.

ririko5834 avatar Apr 22 '22 12:04 ririko5834

Here we go again with Hetrix tools.

I wonder if all these folks suggesting UK work like Hetrixtools just want the HT functions cos they want the unlimited HT functionality without paying for it.

Sounds like it doesn’t it? 🤷🏻‍♂️

markdesilva avatar Apr 22 '22 12:04 markdesilva

@ririko5834 the basic answer to your request is "maybe in the future".

UptimeKuma is a relatively simple uptime monitoring application running on NodeJS. not saying that NodeJS is a bad language. Still, I am nearly pointing out that a different language is more favourable due to performance requirements for what you are asking.

As @markdesilva pointed out, and I agree with them, if you favour Hetrix Tools, you need to support the developer by getting a paid plan. UptimeKuma may be an open-source project for now, but I'm sure that when the time comes, the author will also want to have their own paid plans alongside open-source for those people who don't want to have a hustle of setting one up themselves.

That being said, keep in mind that, as pointed out by @ImmaZoni, to know the server's hardware status, the author will require developing and requesting to installing of a separate "companion" app on the server, which will push the CPU, RAM, etc. information to UptimeKuma. If the last stable release (v1.14.0) is anything to go by, the author wants this application to just run without any additional hoops to jump through (ref. to Cloudflare proxy functionality).

EDIT: Almost forgot, @zimbres also noted that there is another open-source tool called netdata that you can use to monitor server hardware status.

rihards-simanovics avatar Apr 22 '22 13:04 rihards-simanovics

@ririko5834 the basic answer to your request is "maybe in the future".

UptimeKuma is a relatively simple uptime monitoring application running on NodeJS. not saying that NodeJS is a bad language. Still, I am nearly pointing out that a different language is more favourable due to performance requirements for what you are asking.

As @markdesilva pointed out, and I agree with them, if you favour Hetrix Tools, you need to support the developer by getting a paid plan. UptimeKuma may be an open-source project for now, but I'm sure that when the time comes, the author will also want to have their own paid plans alongside open-source for those people who don't want to have a hustle of setting one up themselves.

That being said, keep in mind that, as pointed out by @ImmaZoni, to know the server's hardware status, the author will require developing and requesting to installing of a separate "companion" app on the server, which will push the CPU, RAM, etc. information to UptimeKuma. If the last stable release (v1.14.0) is anything to go by, the author wants this application to just run without any additional hoops to jump through (ref. to Cloudflare proxy functionality).

EDIT: Almost forgot, @zimbres also noted that there is another open-source tool called netdata that you can use to monitor server hardware status.

Hey you don't have to install anything on the server just need to let uptime-kuma the option to connect via ssh to every server and get these info then parse it to the status page.

I have made a bash script that send details like this to my email once it pass the 75% disk usage same goes to the ram and CPU.

I just need to find the right way to send the data to uptime-kuma now for it to send it to me via telegram.

InSelfControll avatar Dec 19 '22 18:12 InSelfControll

@InSelfControll

Hey you don't have to install anything on the server just need to let uptime-kuma the option to connect via ssh to every server and get these info then parse it to the status page.

Do you even realise how dangerous this is? Openly allow an application (of all things) to have access to a server via SSH? It's almost as if security holes don't exist. So now the hacker instead of hacking six of my servers only need to hack one of my servers and get SSH access to all the other servers.

The best and most secure way is to have a dedicated client application that would receive a request (be it via the web URL or else), process it and send a response JSON to an API on the UK side, or alternately just send the JSON data with an interval, so there is only one way communication from server to UK.

What you've proposed breaks the security best practises on so many levels.

UK - UptimeKuma

rihards-simanovics avatar Dec 20 '22 05:12 rihards-simanovics

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands via ssh @rihards-simanovics, you don't have to give fully login access to ssh so no security issues. Today I have it automatically send to my email / telegram from each server.

InSelfControll avatar Dec 20 '22 05:12 InSelfControll

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands

I did think of that, that being said it is still a very junky solution (hence why I didn't mention it). Besides, it's already been mentioned in this discussion that much better paid applications are available. If you have enough servers to warrant an advanced system like that, perhaps it's time to get the wallet out?

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands via ssh @rihards-simanovics, you don't have to give fully login access to ssh so no security issues. Today I have it automatically send to my email / telegram from each server.

Again, it's almost as if security holes don't exist. You are playing an extremely dangerous game by even allowing the potential hacker to login. Look, I'm no security expert, but I can guarantee you, gaining access as a "limited user" is a first step, to a full blown hack, so let's not.

rihards-simanovics avatar Dec 20 '22 05:12 rihards-simanovics

The other option is to fix the push passive monitor, and let users to send custom messages in it.

Now the only message it sends is "ok" kinda useless message.

I want to send the script output via curl into the push passive monitor instead just receiving "ok" message.

Now each of my VMS runs the script all the time and if the disk usage is more then 85% I receive an email with the status and details about the disk usage.

On Tue, Dec 20, 2022, 07:58 Rihards Simanovičs @.***> wrote:

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands

I did think of that, that being said it is still a very junky solution. Besides, it's already been mentioned in this discussion that much better paid applications are available. If you have enough servers to warrant am advanced system like that perhaps it's time to get the wallet out.

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands via ssh @rihards-simanovics https://github.com/rihards-simanovics, you don't have to give fully login access to ssh so no security issues. Today I have it automatically send to my email / telegram from each server.

Again, it's almost as if security holes don't exist, you are playing an extremely dangerous game by even allowing the potential hacker to login. Look I'm no security expert but I can guarantee you gaining access as a "limited user" is a first step to a full blown hack, so lets not.

— Reply to this email directly, view it on GitHub https://github.com/louislam/uptime-kuma/issues/819#issuecomment-1358876746, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZOWTAC7PIYMXREERY3JFTWOFDI7ANCNFSM5G4F6O4Q . You are receiving this because you were mentioned.Message ID: @.***>

InSelfControll avatar Dec 21 '22 18:12 InSelfControll

@InSelfControll

Maybe I don't quite understand your description, but the status msg can take any message. It does not accept the messages in quotes like "Service is up" but it will take URL spaces as in %20 as in Server%20is%20up. Eg:

attl=`/usr/bin/ping -c 1 <UK server IP> | tail -1 | /usr/bin/cut -d"/" -f 5`

/usr/bin/curl -k "https://<UK server IP>:3001/api/push/XXXXXXXXX?msg=Service%20is%20up,%20ping%20time%20is%20$attl&ping=$attl"

As you can see, you can even pass variables (in this case the ping time to the UK server).

Then your UK will show this:

uk-push-service-msg

If you're using a linux machine, normal users (non root, no sudo) have access to to cat /proc/cpuinfo and cat /proc/meminfo, as well as df and can use cut, sed, awk, grep whatever info they need and pass it into the status message, no giving ssh access to UK or sudo or whatever so no security concerns. For windows I think there are equivalent Powershell commands normal users can use to get the values for cpu usage, memory usage and disk usage (Get-Volume).

Hope it works for you.

Cheers!

The other option is to fix the push passive monitor, and let users to send custom messages in it. Now the only message it sends is "ok" kinda useless message. I want to send the script output via curl into the push passive monitor instead just receiving "ok" message. Now each of my VMS runs the script all the time and if the disk usage is more then 85% I receive an email with the status and details about the disk usage.

markdesilva avatar Dec 22 '22 09:12 markdesilva

@markdesilva this seems like a better solution, but would this generate a notification? @InSelfControll needs this to see what the status of the hardware is on their telegram.

rihards-simanovics avatar Dec 22 '22 10:12 rihards-simanovics

@markdesilva this seems like a better solution, but would this generate a notification? @InSelfControll needs this to see what the status of the hardware is on their telegram.

With push notifications I get it directly to my teams / telegram as it should be. I'll keep testing it and update.

The issue now that's the monitor get the heart bit but never send the message more than once.

Example: (for the test I did a check that check if the disk_usage is higher than 1% it should send critical alert)

#!/bin/bash

# Get disk usage
disk_usage=`/usr/bin/df -h | /usr/bin/grep "fedora" | /usr/bin/awk 'END {print $5}' | /usr/bin/tr -d "%"`
disk_usage1=`/usr/bin/df -h | /usr/bin/grep "fedora" | /usr/bin/awk 'END {print $5}'`

# Check if disk usage is higher than 85%
if [ $disk_usage -gt 1 ]; then
  # Send push notification
  /usr/bin/curl -k "http://1.1.1.1:3001/api/push/********?msg=Disk%20Usage%20is%20high:${disk_usage}%25"
fi

image Look at the script and the picture.

InSelfControll avatar Dec 22 '22 10:12 InSelfControll

@InSelfControll

Hi so sorry for the late response, I was out. Let me take a look at the script and what you want to do and get back to you.

markdesilva avatar Dec 22 '22 16:12 markdesilva

@InSelfControll,

Hi, so from what I understand UK only reports on either status up or down. Once a status is reported, unless it changes (from up to down or down to up) it will not report. UK works that way for all reports. The idea behind this is so UK won't spam you multiple times via email. telegram etc. while you away from your system and can't check to rectify the error.

The only way to keep reporting the critical error is to keep keep flipping the status between up and down.

When you first report it down, store that some where, when it next reports check the previous status, and if its "down", change your url status to "up" and replace the stored status. The next time it checks the stored status, it will be "up" so then it will change the status to "down" and so on. For your code:

#!/bin/bash

# Get disk usage
disk_usage=`/usr/bin/df -h | /usr/bin/grep "extmedia2" | /usr/bin/awk 'END {print $5}' | /usr/bin/tr -d "%"`

# Check status file and flip status for continuous notices
if [ -f /tmp/du.status ]; then
   if [ `cat /tmp/du.status` == "up" ]; then
        echo "down" > /tmp/du.status
   else
        echo "up" > /tmp/du.status
   fi
else
   echo "up" > /tmp/du.status
fi

udstatus=`cat /tmp/du.status`

# Check if disk usage is higher than 85%
if [ $disk_usage -gt 1 ]; then
  # Send push notification
  /usr/bin/curl -k "https://1.1.1.1:3001/api/push/**********?status=$udstatus&msg=Disk%20Usage%20is%20high:${disk_usage}%25"
fi

Your UK will look like this:

uk-push_flipstatus

Take note, this will keep spamming you until you pause the monitor or disable the cron for the script.

Honestly I think the default of only sending the message once is the right way to go. Hope this helps.

@louislam if you have time and you think it is useful, maybe there can be an option in the monitor to allow for "down" status to keep repeating until admin intervenes.

[EDIT: @louislam please ignore my comment, I found the option in the monitor, thanks!]

Cheers!

With push notifications I get it directly to my teams / telegram as it should be. I'll keep testing it and update.

The issue now that's the monitor get the heart bit but never send the message more than once.

Example: (for the test I did a check that check if the disk_usage is higher than 1% it should send critical alert)

#!/bin/bash

# Get disk usage
disk_usage=`/usr/bin/df -h | /usr/bin/grep "fedora" | /usr/bin/awk 'END {print $5}' | /usr/bin/tr -d "%"`
disk_usage1=`/usr/bin/df -h | /usr/bin/grep "fedora" | /usr/bin/awk 'END {print $5}'`

# Check if disk usage is higher than 85%
if [ $disk_usage -gt 1 ]; then
  # Send push notification
  /usr/bin/curl -k "http://1.1.1.1:3001/api/push/********?msg=Disk%20Usage%20is%20high:${disk_usage}%25"
fi

image Look at the script and the picture.

markdesilva avatar Dec 22 '22 17:12 markdesilva

Hi, Thanks for your reply. I think this option should be in UK that it'll repeat it every X times if the issue didn't fixed.

Let's say it repeats 4 times each time after 1 minute after 3 times the report will changed into critical and the fourth time will be sent via email by the script. UK will keep reporting every time till it fixed only on issues.

InSelfControll avatar Dec 22 '22 17:12 InSelfControll

@InSelfControll

Silly me, there is already an option in the monitor for sending messages on consecutive heartbeats missed.

uk_retries

This will send the msg to your telegram every minute, but it will NOT reflect in the status on UK multiple times, only once. The only way I can find to have it send to telegram and to show on the UK status multiple times is what I said in my previous post.

markdesilva avatar Dec 22 '22 17:12 markdesilva

Hi, Thanks for your reply. I think this option should be in UK that it'll repeat it every X times if the issue didn't fixed.

Let's say it repeats 4 times each time after 1 minute after 3 times the report will changed into critical and the fourth time will be sent via email by the script. UK will keep reporting every time till it fixed only on issues.

Right, so you can sort of do this by setting the resend notifications if Down X times consequently to "4". But like I said, it will not update the UK status, but only keep sending to your alert (telegram, etc).

markdesilva avatar Dec 22 '22 17:12 markdesilva

Hi, Thanks for your reply. I think this option should be in UK that it'll repeat it every X times if the issue didn't fixed.

Let's say it repeats 4 times each time after 1 minute after 3 times the report will changed into critical and the fourth time will be sent via email by the script. UK will keep reporting every time till it fixed only on issues.

Right, so you can sort of do this by setting the resend notifications if Down X times consequently to "4". But like I said, it will not update the UK status, but only keep sending to your alert (telegram, etc).

The issue that if you mark it as down in the url so it'll not send the correct message it just sending "no heartbeat" instead my message.

InSelfControll avatar Dec 22 '22 17:12 InSelfControll

Yes, you are right. In the alert (eg: telegram) message, it will only say "No heartbeat in the time window". For your own message to appear in the alert message, you will need to use my modifications to your script, just that the status will keep flipping between up and down.

The issue that if you mark it as down in the url so it'll not send the correct message it just sending "no heartbeat" instead my message.

markdesilva avatar Dec 22 '22 17:12 markdesilva