agent
agent copied to clipboard
operator: resources violate PodSecurity policy
I've been trying to install Grafana Agent Operator and get it set up properly for a while and have been struggling.
https://grafana.com/docs/grafana-cloud/kubernetes-monitoring/configuration/config-k8s-agent-guide/#configure-grafana-agent-for-metrics
I have applied the exact manifests suggested by the Grafana Agent Operator manifest generator and it does not work. It turns out the DaemonSets violate the cluster PodSecurity policy of "baseline" which isn't that strict.
❯ k -n grafana-agent get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
grafana-agent-integrations-ds 5 0 0 0 0 <none> 16m
grafana-agent-logs 5 0 0 0 0 <none> 2d3h
Looking deeper:
❯ k -n grafana-agent describe ds grafana-agent-logs
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 46m daemonset-controller Error creating: pods "grafana-agent-logs-tzr4t" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
Warning FailedCreate 29m daemonset-controller Error creating: pods "grafana-agent-logs-xxlrg" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
Warning FailedCreate 12m daemonset-controller Error creating: pods "grafana-agent-logs-bmp64" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
❯ k -n grafana-agent describe ds grafana-agent-integrations-ds
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-h8qnr" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-bb2np" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-q9nnz" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-mj95d" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-pc726" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-f6zlb" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-5p6b7" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-nrvqh" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 10m daemonset-controller Error creating: pods "grafana-agent-integrations-ds-cdk7w" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Warning FailedCreate 1s (x24 over 10m) daemonset-controller (combined from similar events): Error creating: pods "grafana-agent-integrations-ds-nlccd" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
Following from https://github.com/grafana/agent/issues/3363, some feedback would have gone a long way. There were no logs from the operator or the agent, no events on the CRDs (LogsInstance, Integrations). Even just something simple like created daemonset <namespace>/<name>
would have given me enough information to know it was actually trying to do something.
For now, the workaround will be to grant the namespace elevated privileges.
#NamespaceList: items: [{metadata: labels: "pod-security.kubernetes.io/enforce": "privileged"}]
Grafana Agent Operator Manifest Generator
The generated manifests:
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent
namespace: ${NAMESPACE}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent-operator
namespace: ${NAMESPACE}
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.5.0
name: kube-state-metrics
namespace: ${NAMESPACE}
---
apiVersion: v1
data: {}
kind: Secret
metadata:
name: logs-secret
namespace: ${NAMESPACE}
stringData:
password: "no"
username: "no"
type: Opaque
---
apiVersion: v1
data: {}
kind: Secret
metadata:
name: metrics-secret
namespace: ${NAMESPACE}
stringData:
password: eyJrIjoiZTUwZTI3YmViNDg2Zjk1MTUwZDM4ZGMyNWE2MGQ4ODI4ZjkzOGY1MSIsIm4iOiJ1aHRob21hcy1lYXN5c3RhcnQtcHJvbS1wdWJsaXNoZXIiLCJpZCI6NDY5NDIyfQ==
username: "53013"
type: Opaque
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: agent-eventhandler
namespace: ${NAMESPACE}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
- /metrics/cadvisor
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent-operator
rules:
- apiGroups:
- monitoring.grafana.com
resources:
- grafanaagents
- metricsinstances
- logsinstances
- podlogs
- integrations
verbs:
- get
- list
- watch
- apiGroups:
- monitoring.grafana.com
resources:
- grafanaagents/finalizers
- metricsinstances/finalizers
- logsinstances/finalizers
- podlogs/finalizers
- integrations/finalizers
verbs:
- get
- list
- watch
- update
- apiGroups:
- monitoring.coreos.com
resources:
- podmonitors
- probes
- servicemonitors
verbs:
- get
- list
- watch
- apiGroups:
- monitoring.coreos.com
resources:
- podmonitors/finalizers
- probes/finalizers
- servicemonitors/finalizers
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- namespaces
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- secrets
- services
- configmaps
- endpoints
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.5.0
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent
subjects:
- kind: ServiceAccount
name: grafana-agent
namespace: ${NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent-operator
subjects:
- kind: ServiceAccount
name: grafana-agent-operator
namespace: ${NAMESPACE}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.5.0
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: ${NAMESPACE}
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.5.0
name: kube-state-metrics
namespace: ${NAMESPACE}
spec:
clusterIP: None
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
- name: telemetry
port: 8081
targetPort: telemetry
selector:
app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-agent-operator
namespace: ${NAMESPACE}
spec:
minReadySeconds: 10
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
name: grafana-agent-operator
template:
metadata:
labels:
name: grafana-agent-operator
spec:
containers:
- args:
- --kubelet-service=default/kubelet
image: grafana/agent-operator:v0.26.1
imagePullPolicy: IfNotPresent
name: grafana-agent-operator
serviceAccount: grafana-agent-operator
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.5.0
name: kube-state-metrics
namespace: ${NAMESPACE}
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.5.0
spec:
automountServiceAccountToken: true
containers:
- image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsUser: 65534
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent
namespace: ${NAMESPACE}
spec:
image: grafana/agent:v0.26.1
integrations:
selector:
matchLabels:
agent: grafana-agent
logs:
instanceSelector:
matchLabels:
agent: grafana-agent
metrics:
externalLabels:
cluster: ${CLUSTER}
instanceSelector:
matchLabels:
agent: grafana-agent
scrapeInterval: 15s
serviceAccountName: grafana-agent
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: Integration
metadata:
labels:
agent: grafana-agent
name: agent-eventhandler
namespace: ${NAMESPACE}
spec:
config:
cache_path: /etc/eventhandler/eventhandler.cache
logs_instance: ${NAMESPACE}/grafana-agent-logs
name: eventhandler
type:
unique: true
volumeMounts:
- mountPath: /etc/eventhandler
name: agent-eventhandler
volumes:
- name: agent-eventhandler
persistentVolumeClaim:
claimName: agent-eventhandler
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: Integration
metadata:
labels:
agent: grafana-agent
name: node-exporter
namespace: ${NAMESPACE}
spec:
config:
autoscrape:
enable: true
metrics_instance: ${NAMESPACE}/grafana-agent-metrics
procfs_path: host/proc
rootfs_path: /host/root
sysfs_path: /host/sys
name: node_exporter
type:
allNodes: true
unique: true
volumeMounts:
- mountPath: /host/root
name: rootfs
- mountPath: /host/sys
name: sysfs
- mountPath: /host/proc
name: procfs
volumes:
- hostPath:
path: /
name: rootfs
- hostPath:
path: /sys
name: sysfs
- hostPath:
path: /proc
name: procfs
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
labels:
agent: grafana-agent
name: grafana-agent-logs
namespace: ${NAMESPACE}
spec:
clients:
- basicAuth:
password:
key: password
name: logs-secret
username:
key: username
name: logs-secret
externalLabels:
cluster: ${CLUSTER}
url: https://logs-prod-us-central1.grafana.net/loki/api/v1/push
podLogsNamespaceSelector: {}
podLogsSelector:
matchLabels:
instance: primary
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
labels:
agent: grafana-agent
name: grafana-agent-metrics
namespace: ${NAMESPACE}
spec:
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
instance: primary
remoteWrite:
- basicAuth:
password:
key: password
name: metrics-secret
username:
key: username
name: metrics-secret
url: https://prometheus-us-central1.grafana.net/api/prom/push
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
instance: primary
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
labels:
instance: primary
name: kubernetes-logs
namespace: ${NAMESPACE}
spec:
namespaceSelector:
any: true
pipelineStages:
- cri: {}
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
sourceLabels:
- __meta_kubernetes_namespace
targetLabel: namespace
- action: replace
sourceLabels:
- __meta_kubernetes_pod_name
targetLabel: pod
- action: replace
sourceLabels:
- __meta_kubernetes_container_name
targetLabel: container
- replacement: /var/log/pods/*$1/*.log
separator: /
sourceLabels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
targetLabel: __path__
selector:
matchLabels: {}
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: cadvisor-monitor
namespace: ${NAMESPACE}
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*
sourceLabels:
- __name__
path: /metrics/cadvisor
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
replacement: integrations/kubernetes/cadvisor
targetLabel: job
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
any: true
selector:
matchLabels:
app.kubernetes.io/name: kubelet
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: ksm-monitor
namespace: ${NAMESPACE}
spec:
endpoints:
- honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*
sourceLabels:
- __name__
path: /metrics
port: http-metrics
relabelings:
- action: replace
replacement: integrations/kubernetes/kube-state-metrics
targetLabel: job
namespaceSelector:
matchNames:
- ${NAMESPACE}
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
instance: primary
name: kubelet-monitor
namespace: ${NAMESPACE}
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 60s
metricRelabelings:
- action: keep
regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*
sourceLabels:
- __name__
path: /metrics
port: https-metrics
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
- action: replace
replacement: integrations/kubernetes/kubelet
targetLabel: job
scheme: https
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
any: true
selector:
matchLabels:
app.kubernetes.io/name: kubelet