fleet
fleet copied to clipboard
[SURE-4340] Prometheus Metrics missing
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
I would like to monitor the fleet functionality via Prometheus metrics. I found this was already implemented and merge via PR "Add Prometheus metrics to Fleet #769" at May 13, 2022.
The fleet controller has no prometheus metrics code included in release 0.5.1 (released Jan 23) and master branch.
But the code is no longer there. Just the modules are still in go.mod and go.sum. What happend to this feature?
Expected Behavior
I would expect metrics at the controller endpoint /metrics on port 6060
Steps To Reproduce
No response
Environment
- Architecture: amd64
- Fleet Version: 0.5.1
Issues
- [x] #2172
- [x] #2344
- [ ] #2355
- [x] #2315
- [ ] #2295
See SURE-4340
https://github.com/rancher/fleet/pull/769 was superseded by #770. That was partially reviewed and closed.
At first glance I see a few problems with #770, which make this complex:
- exposing port 6060 of the fleet-controller for
/metrics
also allows access to/debug/pprof
, which is insecure. We probably want to enable pprof only with--debug
. - as mentioned in the previous review we want to enable metrics only with
--enable-metrics
. - adding
.global
to the bundledeployment, to store the cluster name from rancher is a smart workaround, but I think we should modify fleet to store a fleet cluster identifier (=cleaned up labels from the live cluster) in the bundledeployments status instead ofbundleDep.Spec.StagedOptions.Helm.Values.Global.Fleet.ClusterLabels[clusterNameLabel]
? I'm afraid of side effects with the helm values related code.
Build this for the new controller-runtime controllers only. This is a requirement for #1850
- https://book.kubebuilder.io/reference/metrics
- what values are exposed via /metrics?
- add fleet values, see above
- no access to pprof/debug
- fleet-controller and fleet-agent
- docs, how to expose the port, prometheus deployment for dev