aws-otel-collector
aws-otel-collector copied to clipboard
ADOT EKS add-on documentation is missing important parts
Describe the bug The EKS add-on documentation on the official AWS page is linking to this Getting Started Guide: https://aws-otel.github.io/docs/getting-started/adot-eks-add-on
When following this guide, no metrics are send to CloudWatch and the adot-collector is showing warnings.
Steps to reproduce I followed the aforementioned guide.
- Create EKS add-on with
aws eks create-addon
- I deployed the
OpenTelemetryCollector
custom resource.
What did you expect to see? I expected that the official EKS add-on configures all necessary components to send metrics and logs to CloudWatch.
What did you see instead?
No metrics were sent to CloudWatch and the adot-collector
showed warning.
Additional context
After some hours of online research, I analysed the kubernetes resources created by the adot-operator
and discovered differences to the maintained helm charts.
I noticed, that the following resources were missing:
- Service Accounts
- Cluster Role
- Cluster Role Binding
- environment values
- volumes
Moreover, I found out I needed to use eksctl
to create a Service Account / IAM Role combination. I attached the following policy: arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
.
Eventually, I used the following manifest file:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: adot-collector-cluster-role
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "endpoints"]
verbs: ["list", "watch", "get"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list", "watch", "get"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes/stats", "configmaps", "events"]
verbs: ["create", "get"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["update"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["otel-container-insight-clusterleader"]
verbs: ["get","update", "create"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create","get", "update"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
resourceNames: ["otel-container-insight-clusterleader"]
verbs: ["get","update", "create"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: adot-collector-cluster-role-binding
subjects:
- kind: ServiceAccount
name: adot-collector
namespace: opentelemetry-operator-system
roleRef:
kind: ClusterRole
name: adot-collector-cluster-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: adot-collector
namespace: opentelemetry-operator-system
spec:
mode: daemonset
serviceAccount: adot-collector
securityContext:
runAsUser: 0
runAsGroup: 0
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: rootfs
hostPath:
path: /
- name: dockersock
hostPath:
path: /var/run/docker.sock
- name: varlibdocker
hostPath:
path: /var/lib/docker
- name: containerdsock
hostPath:
path: /run/containerd/containerd.sock
- name: sys
hostPath:
path: /sys
- name: devdisk
hostPath:
path: /dev/disk/
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: dockersock
mountPath: /var/run/docker.sock
readOnly: true
- name: containerdsock
mountPath: /run/containerd/containerd.sock
- name: varlibdocker
mountPath: /var/lib/docker
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: devdisk
mountPath: /dev/disk
readOnly: true
config: |
extensions:
health_check:
receivers:
awscontainerinsightreceiver:
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsights
log_group_name: '/aws/containerinsights/{ClusterName}/performance'
log_stream_name: '{NodeName}'
log_retention: 30
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources, kubernetes]
metric_declarations:
# node metrics
- dimensions: [[NodeName, InstanceId, ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- dimensions: [[ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- node_cpu_usage_total
- node_cpu_limit
- node_memory_limit
# pod metrics
- dimensions: [[PodName, Namespace, ClusterName]]
metric_name_selectors:
- pod_status
- pod_cpu_utilization
- pod_memory_utilization
- pod_network_rx_bytes
- pod_network_tx_bytes
- pod_cpu_reserved_capacity
- pod_memory_reserved_capacity
- pod_number_of_container_restarts
- pod_cpu_utilization_over_pod_limit
- pod_memory_utilization_over_pod_limit
# cluster metrics
- dimensions: [[ClusterName]]
metric_name_selectors:
- cluster_node_count
- cluster_failed_node_count
# node fs metrics
- dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
metric_name_selectors:
- node_filesystem_utilization
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf]
extensions: [health_check]