agent icon indicating copy to clipboard operation
agent copied to clipboard

operator: resources violate PodSecurity policy

Open uhthomas opened this issue 1 year ago • 2 comments

I've been trying to install Grafana Agent Operator and get it set up properly for a while and have been struggling.

https://grafana.com/docs/grafana-cloud/kubernetes-monitoring/configuration/config-k8s-agent-guide/#configure-grafana-agent-for-metrics

I have applied the exact manifests suggested by the Grafana Agent Operator manifest generator and it does not work. It turns out the DaemonSets violate the cluster PodSecurity policy of "baseline" which isn't that strict.

❯ k -n grafana-agent get ds
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
grafana-agent-integrations-ds   5         0         0       0            0           <none>          16m
grafana-agent-logs              5         0         0       0            0           <none>          2d3h

Looking deeper:

❯ k -n grafana-agent describe ds grafana-agent-logs
...
Events:
  Type     Reason        Age   From                  Message
  ----     ------        ----  ----                  -------
  Warning  FailedCreate  46m   daemonset-controller  Error creating: pods "grafana-agent-logs-tzr4t" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
  Warning  FailedCreate  29m   daemonset-controller  Error creating: pods "grafana-agent-logs-xxlrg" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
  Warning  FailedCreate  12m   daemonset-controller  Error creating: pods "grafana-agent-logs-bmp64" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
❯ k -n grafana-agent describe ds grafana-agent-integrations-ds
...
Events:
  Type     Reason        Age                From                  Message
  ----     ------        ----               ----                  -------
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-h8qnr" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-bb2np" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-q9nnz" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-mj95d" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-pc726" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-f6zlb" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-5p6b7" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-nrvqh" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-cdk7w" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  1s (x24 over 10m)  daemonset-controller  (combined from similar events): Error creating: pods "grafana-agent-integrations-ds-nlccd" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")

Following from https://github.com/grafana/agent/issues/3363, some feedback would have gone a long way. There were no logs from the operator or the agent, no events on the CRDs (LogsInstance, Integrations). Even just something simple like created daemonset <namespace>/<name> would have given me enough information to know it was actually trying to do something.

For now, the workaround will be to grant the namespace elevated privileges.

#NamespaceList: items: [{metadata: labels: "pod-security.kubernetes.io/enforce": "privileged"}]

Grafana Agent Operator Manifest Generator

image

The generated manifests:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent
  namespace: ${NAMESPACE}
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent-operator
  namespace: ${NAMESPACE}
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.5.0
  name: kube-state-metrics
  namespace: ${NAMESPACE}
---
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: logs-secret
  namespace: ${NAMESPACE}
stringData:
  password: "no"
  username: "no"
type: Opaque
---
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: metrics-secret
  namespace: ${NAMESPACE}
stringData:
  password: eyJrIjoiZTUwZTI3YmViNDg2Zjk1MTUwZDM4ZGMyNWE2MGQ4ODI4ZjkzOGY1MSIsIm4iOiJ1aHRob21hcy1lYXN5c3RhcnQtcHJvbS1wdWJsaXNoZXIiLCJpZCI6NDY5NDIyfQ==
  username: "53013"
type: Opaque
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: agent-eventhandler
  namespace: ${NAMESPACE}
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - services
  - endpoints
  - pods
  - events
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  - /metrics/cadvisor
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent-operator
rules:
- apiGroups:
  - monitoring.grafana.com
  resources:
  - grafanaagents
  - metricsinstances
  - logsinstances
  - podlogs
  - integrations
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - monitoring.grafana.com
  resources:
  - grafanaagents/finalizers
  - metricsinstances/finalizers
  - logsinstances/finalizers
  - podlogs/finalizers
  - integrations/finalizers
  verbs:
  - get
  - list
  - watch
  - update
- apiGroups:
  - monitoring.coreos.com
  resources:
  - podmonitors
  - probes
  - servicemonitors
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - monitoring.coreos.com
  resources:
  - podmonitors/finalizers
  - probes/finalizers
  - servicemonitors/finalizers
  verbs:
  - get
  - list
  - watch
  - update
- apiGroups:
  - ""
  resources:
  - namespaces
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - secrets
  - services
  - configmaps
  - endpoints
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.5.0
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  - ingresses
  verbs:
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent
subjects:
- kind: ServiceAccount
  name: grafana-agent
  namespace: ${NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent-operator
subjects:
- kind: ServiceAccount
  name: grafana-agent-operator
  namespace: ${NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.5.0
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: ${NAMESPACE}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.5.0
  name: kube-state-metrics
  namespace: ${NAMESPACE}
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-agent-operator
  namespace: ${NAMESPACE}
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: grafana-agent-operator
  template:
    metadata:
      labels:
        name: grafana-agent-operator
    spec:
      containers:
      - args:
        - --kubelet-service=default/kubelet
        image: grafana/agent-operator:v0.26.1
        imagePullPolicy: IfNotPresent
        name: grafana-agent-operator
      serviceAccount: grafana-agent-operator
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.5.0
  name: kube-state-metrics
  namespace: ${NAMESPACE}
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/component: exporter
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: 2.5.0
    spec:
      automountServiceAccountToken: true
      containers:
      - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsUser: 65534
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
  name: grafana-agent
  namespace: ${NAMESPACE}
spec:
  image: grafana/agent:v0.26.1
  integrations:
    selector:
      matchLabels:
        agent: grafana-agent
  logs:
    instanceSelector:
      matchLabels:
        agent: grafana-agent
  metrics:
    externalLabels:
      cluster: ${CLUSTER}
    instanceSelector:
      matchLabels:
        agent: grafana-agent
    scrapeInterval: 15s
  serviceAccountName: grafana-agent
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: Integration
metadata:
  labels:
    agent: grafana-agent
  name: agent-eventhandler
  namespace: ${NAMESPACE}
spec:
  config:
    cache_path: /etc/eventhandler/eventhandler.cache
    logs_instance: ${NAMESPACE}/grafana-agent-logs
  name: eventhandler
  type:
    unique: true
  volumeMounts:
  - mountPath: /etc/eventhandler
    name: agent-eventhandler
  volumes:
  - name: agent-eventhandler
    persistentVolumeClaim:
      claimName: agent-eventhandler
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: Integration
metadata:
  labels:
    agent: grafana-agent
  name: node-exporter
  namespace: ${NAMESPACE}
spec:
  config:
    autoscrape:
      enable: true
      metrics_instance: ${NAMESPACE}/grafana-agent-metrics
    procfs_path: host/proc
    rootfs_path: /host/root
    sysfs_path: /host/sys
  name: node_exporter
  type:
    allNodes: true
    unique: true
  volumeMounts:
  - mountPath: /host/root
    name: rootfs
  - mountPath: /host/sys
    name: sysfs
  - mountPath: /host/proc
    name: procfs
  volumes:
  - hostPath:
      path: /
    name: rootfs
  - hostPath:
      path: /sys
    name: sysfs
  - hostPath:
      path: /proc
    name: procfs
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
  labels:
    agent: grafana-agent
  name: grafana-agent-logs
  namespace: ${NAMESPACE}
spec:
  clients:
  - basicAuth:
      password:
        key: password
        name: logs-secret
      username:
        key: username
        name: logs-secret
    externalLabels:
      cluster: ${CLUSTER}
    url: https://logs-prod-us-central1.grafana.net/loki/api/v1/push
  podLogsNamespaceSelector: {}
  podLogsSelector:
    matchLabels:
      instance: primary
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
  labels:
    agent: grafana-agent
  name: grafana-agent-metrics
  namespace: ${NAMESPACE}
spec:
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      instance: primary
  remoteWrite:
  - basicAuth:
      password:
        key: password
        name: metrics-secret
      username:
        key: username
        name: metrics-secret
    url: https://prometheus-us-central1.grafana.net/api/prom/push
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      instance: primary
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
  labels:
    instance: primary
  name: kubernetes-logs
  namespace: ${NAMESPACE}
spec:
  namespaceSelector:
    any: true
  pipelineStages:
  - cri: {}
  relabelings:
  - sourceLabels:
    - __meta_kubernetes_pod_node_name
    targetLabel: __host__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    sourceLabels:
    - __meta_kubernetes_namespace
    targetLabel: namespace
  - action: replace
    sourceLabels:
    - __meta_kubernetes_pod_name
    targetLabel: pod
  - action: replace
    sourceLabels:
    - __meta_kubernetes_container_name
    targetLabel: container
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    sourceLabels:
    - __meta_kubernetes_pod_uid
    - __meta_kubernetes_pod_container_name
    targetLabel: __path__
  selector:
    matchLabels: {}
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    instance: primary
  name: cadvisor-monitor
  namespace: ${NAMESPACE}
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    interval: 60s
    metricRelabelings:
    - action: keep
      regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*
      sourceLabels:
      - __name__
    path: /metrics/cadvisor
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    - action: replace
      replacement: integrations/kubernetes/cadvisor
      targetLabel: job
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  namespaceSelector:
    any: true
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    instance: primary
  name: ksm-monitor
  namespace: ${NAMESPACE}
spec:
  endpoints:
  - honorLabels: true
    interval: 60s
    metricRelabelings:
    - action: keep
      regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*
      sourceLabels:
      - __name__
    path: /metrics
    port: http-metrics
    relabelings:
    - action: replace
      replacement: integrations/kubernetes/kube-state-metrics
      targetLabel: job
  namespaceSelector:
    matchNames:
    - ${NAMESPACE}
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    instance: primary
  name: kubelet-monitor
  namespace: ${NAMESPACE}
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    interval: 60s
    metricRelabelings:
    - action: keep
      regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*
      sourceLabels:
      - __name__
    path: /metrics
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    - action: replace
      replacement: integrations/kubernetes/kubelet
      targetLabel: job
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  namespaceSelector:
    any: true
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet

uhthomas avatar Mar 28 '23 15:03 uhthomas