flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

metrics to show the current GitRepository

Open qudongfang opened this issue 3 years ago • 5 comments

Describe the bug

People may flip GitRepository::spec.ref.branch to a dev branch to test/verify something and forget to revert back after.

It would be great if flux2 can export metrics to show the current GitRepository::spec.ref.branch. and then we can alert(warn) if GitRepository::spec.ref.branch is not on main/master/prod branch.

Steps to reproduce

N/A

Expected behavior

N/A

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

N/A

Flux check

N/A

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

qudongfang avatar Sep 13 '22 11:09 qudongfang

The GitRepository manifest should rather itself live in Git and be protected by e.g. PRs or MRs.

makkes avatar Sep 13 '22 16:09 makkes

I think we're suggesting that you would use a static analysis to prevent this configuration before it reaches the cluster. I don't know if that means we shouldn't expose this as a metric, there are a lot of different things we could expose as a metric.

This one is very specific to the GitRepository kind. The branch name is not something I think we would usually consider as a metric to export.

More examples of metrics we might collect:

  • How many cross-namespace access references are in use
  • How many gitrepos are using submodules
  • How many sources are verified by cryptography
  • How many are suspended
  • How many use the go-git backend / how many are using libgit2

Maybe we can provide a more generic way to determine which metrics are exported, or maybe we should just spend some focus time on this and build the right metrics in so all use cases are happy 👍

Or, we could build some kind of external reporter that monitors the Flux objects through GitOps Toolkit API and provides those metrics on-demand. That would be a really good use of the Flux APIs!

kingdonb avatar Sep 15 '22 17:09 kingdonb

The GitRepository manifest should rather itself live in Git and be protected by e.g. PRs or MRs.

In our current setup, terraform is managing the GitRepository manifest on the cluster with DataFluxSync during bootstrap. I don't see any immediate issues with moving away from doing that and having terraform push the GitRepository manifest into git instead.

My only concern is loosing the ability to do pre-merge testing. As part of our workflow we often flip a cluster to a custom branch to test out changes. That workflow can be changed to opening a PR and merging it to update the ref but then we still have the issue of folks forgetting to flip that ref back to the original one.

Or, we could build some kind of external reporter that monitors the Flux objects through GitOps Toolkit API and provides those metrics on-demand. That would be a really good use of the Flux APIs!

This is what I was thinking about doing too but wanted to see what the others thought about adding the ref as a label or its own metric first.

And not to get off-topic but since our GitRepository is managed outside of git right now, I was actually thinking about building a deployment pipeline to rollout changes in a staggered fashion using Spinnaker (which we use for app deploys). That pipeline would take the new git ref and slowly roll it out and check metrics along the way. I could build a service to do automatic commits to git too but that would be a bit more work vs using the tooling I have access to today.

dmichel1 avatar Sep 16 '22 14:09 dmichel1

I'm not for adding the ref to metrics, a ref in spec can be a semver rage or a commit SHA, if we report the resolved ref then our metrics will suffer from high cardinality, each commit will result in a unique metric breaking our dashboards and all the current alerts people are using. Also having a different set of metrics for Git means we need to drop our generic metric and come up with dedicated ones for each custom resource kind and all their combined fields.

stefanprodan avatar Sep 19 '22 10:09 stefanprodan

I was thinking this could be implemented as an additional metric and not added as a label to existing metrics because of all the reasons you listed (and I 100% agree).

kube and istio do this with the kube_node_info and istio_build metrics which have all the version information.

kube_node_info{container_runtime_version="containerd://1.4.13", internal_ip="10.3.67.211", job="kube-state-metrics", kernel_version="5.4.202+", kubelet_version="v1.21.14-gke.2100", kubeproxy_version="v1.21.14-gke.2100", node="gke-kube-1", os_image="Container-Optimized OS from Google", pod_cidr="10.3.73.0/26", provider_id="gce://kube-1", system_uuid="1faaf414-4b03-bc45"}

e.g. we could have something like gotk_gitrepository_info

dmichel1 avatar Sep 19 '22 19:09 dmichel1

Why not adding the current sha1 (not the ref) so it is alway the same kind of value ?

mtparet avatar May 05 '23 11:05 mtparet

Please refer to:

  • https://github.com/fluxcd/flux2/issues/4128

This has been addressed, now you can create any metrics that you wish if the information is in the CRD spec or status, you can create a metric to report on it.

See also:

https://fluxcd.io/flux/monitoring/custom-metrics/

kingdonb avatar Aug 31 '23 17:08 kingdonb

Wonderful, will try it soon! Thanks !

mtparet avatar Aug 31 '23 21:08 mtparet