opentelemetry-helm-charts
opentelemetry-helm-charts copied to clipboard
opentelemetry helm charts is not straightforward for Airflow[otel] integration
Hello I'am trying to setup opentelemetry collector with apache-airflow[otel] metrics integration. I tried a lot of configuration without succeeded .
Airflow 2.8.2 with pip install apache-airflow[otel]
Helm Chart version 0.84.0
app version 0.96.0
My airflow metrics configuration look like :
[metrics]
otel_on = True
otel_host = kube-opentelemetry-collector.open-telemetry.svc.cluster.local
otel_port = 4318
otel_interval_milliseconds = 30000
otel_ssl_active: False
When trying to follow airflow breeze configuration http://localhost:8889/metrics is up and running but do not display anything .
My helm chart values.yaml look like :
mode: deployment
resources:
limits:
cpu: 250m
memory: 512Mi
config:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
processors:
batch: {}
exporters:
debug:
verbosity: detailed
prometheus:
endpoint: 0.0.0.0:8889
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug, prometheus]
When choosing default helm chart values.yaml i get only otel default metrics something like on http://localhost:8888/metrics
# HELP otelcol_exporter_send_failed_metric_points Number of metric points in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_metric_points counter
otelcol_exporter_send_failed_metric_points{exporter="debug",service_instance_id="668b72b5-0850-4043-b9b3-53e266900ac4",service_name="otelcol-contrib",service_version="0.96.0"} 0
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter
My helm chart values look like
.
.
mode: deployment
config:
exporters:
debug: {}
logging: {}
extensions:
health_check:
endpoint: '0.0.0.0:13133'
memory_ballast: {}
processors:
batch: {}
memory_limiter: null
receivers:
jaeger:
protocols:
grpc:
endpoint: '0.0.0.0:14250'
thrift_http:
endpoint: '0.0.0.0:14268'
thrift_compact:
endpoint: '0.0.0.0:6831'
otlp:
protocols:
grpc:
endpoint: '0.0.0.0:4317'
http:
endpoint: '0.0.0.0:4318'
prometheus:
config:
scrape_configs:
- job_name: opentelemetry-collector
scrape_interval: 10s
static_configs:
- targets:
- '0.0.0.0:8888'
zipkin:
endpoint: '0.0.0.0:9411'
service:
telemetry:
metrics:
address: '0.0.0.0:8888'
extensions:
- health_check
- memory_ballast
pipelines:
logs:
exporters:
- debug
processors:
- memory_limiter
- batch
receivers:
- otlp
metrics:
exporters:
- debug
processors:
- memory_limiter
- batch
receivers:
- otlp
- prometheus
traces:
exporters:
- debug
processors:
- memory_limiter
- batch
receivers:
- otlp
- jaeger
- zipkin
.
.
ports:
otlp:
metrics:
enabled: true
containerPort: 8888
servicePort: 8888
protocol: TCP
Network connectivity between airflow containers and opentelemetry deployment is also OK .
I could use your help.
Any suggestions ?
Any thought?
I have the same problem
I am no familiar with Airflow, is it the source of metrics and is it sending over OTLP?
If you done want the otel default metrics remove the prometheus receiver.
Stills relevant
anyone got otel working with airflow? If so can you advise how what is the magic? Wasted too much time on this already
@jiribroulik i manage and succeed something , with disabling StatsD .
metrics:
otel_on: True
otel_host: localhost
otel_interval_milliseconds: 30000
otel_port: 4318
otel_prefix: myprefix
otel_ssl_active: False
statsd:
enabled: False
@dthauvin - Can you share your airflow.cfg [metrics], airflow helm values and Otel-Collector Helm values yaml? I have tried with Otel-Collector contrib and k8s image 0.108.0 and 0.111.0 - enabling connection on '0.0.0.0' as well as '${env:MY_POD_IP}:port' in helm values yaml. In both of these approaches I'm getting connection refused errors. Otel container is not allowing the connection to get established and hence even though airflow web server tries several times to send metrics info it's unable to reach to collector.
@dthauvin - Can you share your airflow.cfg [metrics], airflow helm values and Otel-Collector Helm values yaml? I have tried with Otel-Collector contrib and k8s image 0.108.0 and 0.111.0 - enabling connection on '0.0.0.0' as well as '${env:MY_POD_IP}:port' in helm values yaml. In both of these approaches I'm getting connection refused errors. Otel container is not allowing the connection to get established and hence even though airflow web server tries several times to send metrics info it's unable to reach to collector.
In my experience, majority of the connection related error were fixed by changing the endpoint from '${env:MY_POD_IP}:port' to 0.0.0.0:4318 (otlp http receiver port, usually), and then making sure the port is exposed via service, so that things running outside can send it through the port mentioned.
@jiribroulik i manage and succeed something , with disabling StatsD .
metrics: otel_on: True otel_host: localhost otel_interval_milliseconds: 30000 otel_port: 4318 otel_prefix: myprefix otel_ssl_active: False statsd: enabled: False
So, @dthauvin , are you saying when you are using otel metrics, you also have to explicitly disable statsd part?
OTEL config:
receivers: otlp: protocols: grpc: http: endpoint: "0.0.0.0:4318" # tls: # cert_file: "/opt/airflow/ssl/certificate.crt" # key_file: "/opt/airflow/ssl/private.key" processors: batch:
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
namespace: airflow
const_labels:
instance: airflow
send_timestamps: true
metric_expiration: 180m
resource_to_telemetry_conversion:
enabled: true
logging:
verbosity: detailed
service: pipelines: metrics: receivers: [otlp] processors: [batch] exporters: [prometheus] logs: receivers: [otlp] exporters: [logging]
docker-compose file:
OpenTelemetry Collector configuration
airflow-otel-collector: container_name: airflow-otel-collector image: otel/opentelemetry-collector-contrib command: ["--config=/etc/otelcol-contrib/airflow-otel-config.yaml"] volumes: - ./airflow-otel-config.yaml:/etc/otelcol-contrib/airflow-otel-config.yaml - ./ssl:/opt/airflow/ssl/ ports: - 8888:8888 # Prometheus metrics exposed by the Collector - 8889:8889 # Prometheus exporter metrics - 13133:13133 # health_check extension - 4317:4317 # OTLP gRPC receiver - 4318:4318 # OTLP http receiver healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:13133/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always
Airflow Environment Variables: AIRFLOW__METRICS__OTEL_ON: 'true' AIRFLOW__METRICS__OTEL_HOST: 'airflow-otel-collector' AIRFLOW__METRICS__OTEL_PORT: '4318' AIRFLOW__METRICS__OTEL_INTERVAL_MILLISECONDS: '30000'
works like a charm.
make sure you explicitly give the container a name and put it as a otel-host since it was giving me the same issue.