lobsters-ansible icon indicating copy to clipboard operation
lobsters-ansible copied to clipboard

Monitoring

Open pushcx opened this issue 6 years ago • 7 comments

It would be good to graph cpu/ram/disk usage and make basic graphs to more quickly spot things like the backup cron job's bzip2 pinning the cpu (and io?) causing nginx to throw 500s while it's running. I don't know tools are good for this so I'm opening this to bikeshedding. Looking for something small and reliable rather than a significant moving part.

pushcx avatar Feb 13 '18 01:02 pushcx

From the back of my head, I remember this https://github.com/firehol/netdata is pretty lightweight, easy to install and setup. There are no big features for analytics, but it's still enough to correlate some things and have a global overview. The other solution would be to sign-up for a free account on something like datadog.

jstoja avatar Feb 13 '18 11:02 jstoja

Looks neat! @alanpost popped up in chat to say that prgmr has an internal nagios instance he can hook us into, so that's a nice solution. I'm going to keep this issue open to track the implementation.

pushcx avatar Feb 13 '18 13:02 pushcx

This is indeed a great solution. Nagios was a no-go if it was to install on the same server, but like that it would be perfect.

jstoja avatar Feb 13 '18 13:02 jstoja

Hey @pushcx Did you have time to check with @alanpost what can be done? Since Nagios is a pull based monitoring system, the checks wouldn't be present here.

jstoja avatar Mar 30 '18 16:03 jstoja

I've had the impression he's been quite busy lately and his recent comment reflects that. With the backup job niced and ioniced there's no regular problem, so I'm not in a rush.

pushcx avatar May 02 '18 10:05 pushcx

Okay, that's a good thing then :) If you ever have issues present in the logs of the rails app, and want to be notified instead of greping around, I found logwatch(8) to be pretty simple and useful.

For Datadog, the free version is 1day data retention and without alerts, so that wouldn't be that useful. Although, contacting them to sponsor the community might solve this issue if it's the way we'd want to go for.

jstoja avatar May 02 '18 11:05 jstoja

Hey, have you had the time to look into it? Recently some comments spoke about it and I remembered that this issue wasn't closed.

jstoja avatar Sep 04 '18 20:09 jstoja