fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Monitoring Bundle in Modified State

Open hazem-bouaziz opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Upon upgrading Rancher to version 2.7.9 and updating the rancher-monitoring Helm chart to v102.0.2+up40.1.2, I encountered issues with the deployed bundle being labeled as modified. This resulted in the following error message indicating modifications to the servicemonitor.monitoring.coreos.com in the kube-system namespace.

To resolve this, I applied patches, although I am uncertain if this is the optimal solution. The error message detailed the specific modifications made to the ServiceMonitor's spec, including changes to several relabelings and metricRelabelings configurations.

  Modified(1) [Bundle monitoring-monitoring]; servicemonitor.monitoring.coreos.com kube-system/rancher-monitoring-kubelet modified {"spec":{"endpoints":[{"bearerTokenFile":"/var/run/secrets/kubernetes.io/serviceaccount/token","honorLabels":true,"port":"https-metrics","relabelings":[{"sourceLabels":["__metrics_path__"],"targetLabel":"metrics_path"}],"scheme":"https","tlsConfig":{"caFile":"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt","insecureSkipVerify":true}},{"bearerTokenFile":"/var/run/secrets/kubernetes.io/serviceaccount/token","honorLabels":true,"metricRelabelings":[{"action":"drop","regex":"container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)","sourceLabels":["__name__"]},{"action":"drop","regex":"container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)","sourceLabels":["__name__"]},{"action":"drop","regex":"container_memory_(mapped_file|swap)","sourceLabels":["__name__"]},{"action":"drop","regex":"container_(file_descriptors|tasks_state|threads_max)","sourceLabels":["__name__"]},{"action":"drop","regex":"container_spec.*","sourceLabels":["__name__"]},{"action":"drop","regex":".+;","sourceLabels":["id","pod"]}],"path":"/metrics/cadvisor","port":"https-metrics","relabelings":[{"sourceLabels":["__metrics_path__"],"targetLabel":"metrics_path"}],"scheme":"https","tlsConfig":{"caFile":"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt","insecureSkipVerify":true}},{"bearerTokenFile":"/var/run/secrets/kubernetes.io/serviceaccount/token","honorLabels":true,"path":"/metrics/probes","port":"https-metrics","relabelings":[{"sourceLabels":["__metrics_path__"],"targetLabel":"metrics_path"}],"scheme":"https","tlsConfig":{"caFile":"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt","insecureSkipVerify":true}}]}}

image

Expected Behavior

ensure that the deployed resources match the intended configuration as defined by Fleet. This involves synchronizing the current state of resources managed by operators with the expected state as described in the Fleet configuration.

Steps To Reproduce

No response

Environment

- Architecture: x86_64
- Fleet Version: v0.8.2
- Cluster:
  - Provider: EKS
  - Options:
  - Kubernetes Version: v1.23

Logs

No response

Anything else?

I also had to add some patches with some fixed values for other ServiceMonitors

  - apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    name: rancher
    namespace: cattle-system
    operations:
    - {"op": "add", "path": "/metadata/labels/app.kubernetes.io~1version", "value": "102.0.2_up40.1.2" }
    - {"op": "add", "path": "/metadata/labels/chart", "value": "rancher-monitoring-102.0.2_up40.1.2" }
    - {"op": "add", "path": "/spec/selector/matchLabels/chart", "value": "rancher-2.7.9"}
  - apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    name: rancher-monitoring-apiserver
    namespace: default
    operations:
    - {"op": "add", "path": "/metadata/labels/app.kubernetes.io~1version", "value": "102.0.2_up40.1.2" }
    - {"op": "add", "path": "/metadata/labels/chart", "value": "rancher-monitoring-102.0.2_up40.1.2" }
  - apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    name: rancher-monitoring-ingress-nginx
    namespace: ingress-nginx
    operations:
    - {"op": "add", "path": "/metadata/labels/app.kubernetes.io~1version", "value": "102.0.2_up40.1.2" }
    - {"op": "add", "path": "/metadata/labels/chart", "value": "rancher-monitoring-102.0.2_up40.1.2" }
  - apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    name: rancher-monitoring-coredns
    namespace: kube-system
    operations:
    - {"op": "add", "path": "/metadata/labels/app.kubernetes.io~1version", "value": "102.0.2_up40.1.2" }
    - {"op": "add", "path": "/metadata/labels/chart", "value": "rancher-monitoring-102.0.2_up40.1.2" 

hazem-bouaziz avatar Feb 12 '24 11:02 hazem-bouaziz

This seems like you need to ignore fields, which might have been modified by the prometheus operator.

https://fleet.rancher.io/bundle-diffs

manno avatar Apr 24 '24 13:04 manno