flux2
flux2 copied to clipboard
metrics to show the current GitRepository
Describe the bug
People may flip GitRepository::spec.ref.branch to a dev branch to test/verify something and forget to revert back after.
It would be great if flux2 can export metrics to show the current GitRepository::spec.ref.branch.
and then we can alert(warn) if GitRepository::spec.ref.branch is not on main/master/prod branch.
Steps to reproduce
N/A
Expected behavior
N/A
Screenshots and recordings
No response
OS / Distro
N/A
Flux version
N/A
Flux check
N/A
Git provider
No response
Container Registry provider
No response
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
The GitRepository manifest should rather itself live in Git and be protected by e.g. PRs or MRs.
I think we're suggesting that you would use a static analysis to prevent this configuration before it reaches the cluster. I don't know if that means we shouldn't expose this as a metric, there are a lot of different things we could expose as a metric.
This one is very specific to the GitRepository kind. The branch name is not something I think we would usually consider as a metric to export.
More examples of metrics we might collect:
- How many cross-namespace access references are in use
- How many gitrepos are using submodules
- How many sources are verified by cryptography
- How many are suspended
- How many use the go-git backend / how many are using libgit2
Maybe we can provide a more generic way to determine which metrics are exported, or maybe we should just spend some focus time on this and build the right metrics in so all use cases are happy 👍
Or, we could build some kind of external reporter that monitors the Flux objects through GitOps Toolkit API and provides those metrics on-demand. That would be a really good use of the Flux APIs!
The GitRepository manifest should rather itself live in Git and be protected by e.g. PRs or MRs.
In our current setup, terraform is managing the GitRepository manifest on the cluster with DataFluxSync during bootstrap. I don't see any immediate issues with moving away from doing that and having terraform push the GitRepository manifest into git instead.
My only concern is loosing the ability to do pre-merge testing. As part of our workflow we often flip a cluster to a custom branch to test out changes. That workflow can be changed to opening a PR and merging it to update the ref but then we still have the issue of folks forgetting to flip that ref back to the original one.
Or, we could build some kind of external reporter that monitors the Flux objects through GitOps Toolkit API and provides those metrics on-demand. That would be a really good use of the Flux APIs!
This is what I was thinking about doing too but wanted to see what the others thought about adding the ref as a label or its own metric first.
And not to get off-topic but since our GitRepository is managed outside of git right now, I was actually thinking about building a deployment pipeline to rollout changes in a staggered fashion using Spinnaker (which we use for app deploys). That pipeline would take the new git ref and slowly roll it out and check metrics along the way. I could build a service to do automatic commits to git too but that would be a bit more work vs using the tooling I have access to today.
I'm not for adding the ref to metrics, a ref in spec can be a semver rage or a commit SHA, if we report the resolved ref then our metrics will suffer from high cardinality, each commit will result in a unique metric breaking our dashboards and all the current alerts people are using. Also having a different set of metrics for Git means we need to drop our generic metric and come up with dedicated ones for each custom resource kind and all their combined fields.
I was thinking this could be implemented as an additional metric and not added as a label to existing metrics because of all the reasons you listed (and I 100% agree).
kube and istio do this with the kube_node_info and istio_build metrics which have all the version information.
kube_node_info{container_runtime_version="containerd://1.4.13", internal_ip="10.3.67.211", job="kube-state-metrics", kernel_version="5.4.202+", kubelet_version="v1.21.14-gke.2100", kubeproxy_version="v1.21.14-gke.2100", node="gke-kube-1", os_image="Container-Optimized OS from Google", pod_cidr="10.3.73.0/26", provider_id="gce://kube-1", system_uuid="1faaf414-4b03-bc45"}
e.g. we could have something like gotk_gitrepository_info
Why not adding the current sha1 (not the ref) so it is alway the same kind of value ?
Please refer to:
- https://github.com/fluxcd/flux2/issues/4128
This has been addressed, now you can create any metrics that you wish if the information is in the CRD spec or status, you can create a metric to report on it.
See also:
https://fluxcd.io/flux/monitoring/custom-metrics/
Wonderful, will try it soon! Thanks !