fleet
fleet copied to clipboard
Add more Metrics to GitOps
The introduction of events seems to have removed some log lines. We also need more metrics to troubleshoot large fleet deployments with a huge number of gitrepos.
- [ ] add metrics like
- number of jobs created
Here's a few things that could be of interest.
The number of times
- a repo has been updated (git pull)
- we have checked for a new commit
but also the time it takes to
- get the most recent commit of a repository
- download/update a git repository
- create a bundle
- create a bundle deployment
System Information
| Rancher Version | Fleet Version |
|---|---|
| v2.12-90de58cccdcc8254854a5517d73010cf638d1655-head | 107.0.0+up0.13.0-alpha.3 |
Metrics to be checked
- List of metrics found here: https://github.com/rancher/fleet/issues/3668#issue-3055783493
gitjobs_created_success_total gitjobs_created_failure_total gitjob_duration_seconds gitrepo_fetch_latest_commit_success_total gitrepo_fetch_latest_commit_failure_total # bucket with _bucket _sum _count suffixes gitrepo_fetch_latest_commit_duration_seconds
Below steps performed.
- Create 2-3
GitRepo's - Wait for them to be ready
- Create another
GitRepowith repository doesn't exists (for failure total metrics) - Go to
Servicesand find- monitoring-gitjob - monitoring-fleet-controller - In order to check above metrics, we have to
PORT-FORWARDmonitoring-gitjobservicetolocalhostby using below commands. - First download
kubconfigfile from cluster (where above services are available, in my caselocalcluster) to local machine.kubectl --kubeconfig <kubeconfig> -n cattle-fleet-system port-forward svc/monitoring-gitjob 8081:8081 - On local machine, using
curlcommand to get the metrics and store in filemetrics_to_be_checked.txtcurl http://localhost:8081/metrics >metrics_to_be_checked.txt - Check above metrics are available in
metrics_to_be_checked.txtfile
See below video for metrics available in gitjob.
Part: 1 Setup local machine to get Metrics
https://github.com/user-attachments/assets/1d42543b-c5fd-4205-987e-de376eced64e
Part: 2 Check fetched metrics
https://github.com/user-attachments/assets/e25f92f3-6237-4fb9-a9da-46eddb524647