benji icon indicating copy to clipboard operation
benji copied to clipboard

Grafana Dashboards

Open devopstales opened this issue 6 years ago • 7 comments

I think it would be helpful if the benji project shared a generic grafana dashboard.json file to help users discover what key things they should be monitoring via prometheus.

devopstales avatar Jun 27 '19 07:06 devopstales

Yes, that would be helpful. But to be honest I'm still a little bit of novice with all things Prometheus and Grafana as well and I'm not even sure if the currently exposed metrics are actually useful. It will be some time until I'll look into this more deeply. But I'd accept, review and merge contributions in the mean time.

elemental-lf avatar Jul 10 '19 11:07 elemental-lf

Hi, i'm interested in helping on this one. I just installed benji, and it seems to be working well, and now i'd like to keep it that way with some monitoring and alerting.

I'm using benji via the helm chart, so the way that makes sense for me to think about this is:

  • Align on the right set of prometheus metrics (if needed, anything in addition to the existing cron job metrics)
  • Sketch out some example prometheus alert rules (again if needed to supplement cron job alerts)
  • Make it easy to integrate with the prometheus operators with options for the helm chart that can add ServiceMonitor and PrometheusRule
  • Make a basic grafana dashboard
  • Make it easy to include grafana dashboards in the helm chart via configmaps.

If that makes sense, would you prefer we track these items in separate issues or just use this one?

allenporter avatar Aug 25 '21 17:08 allenporter

I wrote up some initial thoughts in a document and would love feedback.

allenporter avatar Aug 31 '21 01:08 allenporter

@allenporter I'm sorry that it took me so long to respond to you. My Grafana knowledge is very limited but I few comments about your text:

  • Re backup target metrics: Plotting backup duration by PVC and not only namespace might also be useful. There might be a type of graph where a summary by namespace and individual durations could be integrated into one graph.
  • Re storage metrics: When looking at the whole storage the shared value won't be useful as it will always be zero. There is an explanation of the values in the documentation at https://benji-backup.me/statistics.html.

Have you made any progress based on your document in the meantime?

elemental-lf avatar Jan 27 '22 13:01 elemental-lf

Hi, I think the first step after this was the issue related to key group. I was waiting to see how review of that PR was recieved before proceeding.

Now I have forgotten all of this, but can re-review and see if there are some ideas here worth persuing.

Is this a project you'd be active on giving feedback/reviewing if I were to push a little on it? (Basically curious how active you'll be going forward. I appreciate you triaging everything from the last 5 months today)

allenporter avatar Jan 27 '22 15:01 allenporter

@elemental-lf do you have any docs or a short writeup about setting up a benji development environment, running tests, etc? It could save me some time in getting started, as the project isn't setup how i'm used to when jumping into python projects. (Not that i'm an expert, i'm just seeing basic things like the tests need to be run in a specific way and i can't tell whats missing)

allenporter avatar Feb 21 '22 04:02 allenporter

@allenporter an example of how to setup the tests can be found inside this project's GitHub actions:

https://github.com/elemental-lf/benji/blob/ea0a37de66b813c6f2f35ed146e4e08dafefd5d0/.github/workflows/all-in-one.yaml#L90-L144

Some tests can be enabled or disabled by setting an environment variable. See:

https://github.com/elemental-lf/benji/blob/ea0a37de66b813c6f2f35ed146e4e08dafefd5d0/.github/workflows/all-in-one.yaml#L12-L16

elemental-lf avatar Mar 03 '22 14:03 elemental-lf