fleet icon indicating copy to clipboard operation
fleet copied to clipboard

[SURE-4340] Prometheus Metrics missing

Open ulikl opened this issue 1 year ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

I would like to monitor the fleet functionality via Prometheus metrics. I found this was already implemented and merge via PR "Add Prometheus metrics to Fleet #769" at May 13, 2022.

The fleet controller has no prometheus metrics code included in release 0.5.1 (released Jan 23) and master branch.

But the code is no longer there. Just the modules are still in go.mod and go.sum. What happend to this feature?

Expected Behavior

I would expect metrics at the controller endpoint /metrics on port 6060

Steps To Reproduce

No response

Environment

- Architecture: amd64
- Fleet Version: 0.5.1

Issues

  • [x] #2172
  • [x] #2344
  • [ ] #2355
  • [x] #2315
  • [ ] #2295

ulikl avatar Mar 15 '23 14:03 ulikl

See SURE-4340

kkaempf avatar Apr 04 '23 13:04 kkaempf

https://github.com/rancher/fleet/pull/769 was superseded by #770. That was partially reviewed and closed.

At first glance I see a few problems with #770, which make this complex:

  • exposing port 6060 of the fleet-controller for /metrics also allows access to /debug/pprof, which is insecure. We probably want to enable pprof only with --debug.
  • as mentioned in the previous review we want to enable metrics only with --enable-metrics.
  • adding .global to the bundledeployment, to store the cluster name from rancher is a smart workaround, but I think we should modify fleet to store a fleet cluster identifier (=cleaned up labels from the live cluster) in the bundledeployments status instead of bundleDep.Spec.StagedOptions.Helm.Values.Global.Fleet.ClusterLabels[clusterNameLabel]? I'm afraid of side effects with the helm values related code.

manno avatar May 26 '23 15:05 manno

Build this for the new controller-runtime controllers only. This is a requirement for #1850

  • https://book.kubebuilder.io/reference/metrics
  • what values are exposed via /metrics?
  • add fleet values, see above
  • no access to pprof/debug
  • fleet-controller and fleet-agent
  • docs, how to expose the port, prometheus deployment for dev

manno avatar Dec 20 '23 14:12 manno