opentelemetry-helm-charts icon indicating copy to clipboard operation
opentelemetry-helm-charts copied to clipboard

opentelemetry helm charts is not straightforward for Airflow[otel] integration

Open dthauvin opened this issue 1 year ago • 9 comments
trafficstars

Hello I'am trying to setup opentelemetry collector with apache-airflow[otel] metrics integration. I tried a lot of configuration without succeeded .

Airflow 2.8.2 with pip install apache-airflow[otel] Helm Chart version 0.84.0 app version 0.96.0

My airflow metrics configuration look like :

[metrics]
otel_on = True
otel_host = kube-opentelemetry-collector.open-telemetry.svc.cluster.local
otel_port = 4318
otel_interval_milliseconds = 30000 
otel_ssl_active: False

When trying to follow airflow breeze configuration http://localhost:8889/metrics is up and running but do not display anything .

My helm chart values.yaml look like :

mode: deployment
resources:
  limits:
    cpu: 250m
    memory: 512Mi
config:
  receivers:
    otlp:
      protocols:
        http: 
          endpoint: 0.0.0.0:4318
  processors:
    batch: {}
  exporters:
    debug:
      verbosity: detailed
    prometheus:
      endpoint: 0.0.0.0:8889
  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [batch]
        exporters: [debug]
      metrics:
        receivers: [otlp]
        processors: [batch]
        exporters: [debug, prometheus]

When choosing default helm chart values.yaml i get only otel default metrics something like on http://localhost:8888/metrics

# HELP otelcol_exporter_send_failed_metric_points Number of metric points in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_metric_points counter
otelcol_exporter_send_failed_metric_points{exporter="debug",service_instance_id="668b72b5-0850-4043-b9b3-53e266900ac4",service_name="otelcol-contrib",service_version="0.96.0"} 0
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter

My helm chart values look like

.
.
mode: deployment
config:
  exporters:
    debug: {}
    logging: {}
  extensions:
    health_check:
      endpoint: '0.0.0.0:13133'
    memory_ballast: {}
  processors:
    batch: {}
    memory_limiter: null
  receivers:
    jaeger:
      protocols:
        grpc:
          endpoint: '0.0.0.0:14250'
        thrift_http:
          endpoint: '0.0.0.0:14268'
        thrift_compact:
          endpoint: '0.0.0.0:6831'
    otlp:
      protocols:
        grpc:
          endpoint: '0.0.0.0:4317'
        http:
          endpoint: '0.0.0.0:4318'
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
              - targets:
                  - '0.0.0.0:8888'
    zipkin:
      endpoint: '0.0.0.0:9411'
  service:
    telemetry:
      metrics:
        address: '0.0.0.0:8888'
    extensions:
      - health_check
      - memory_ballast
    pipelines:
      logs:
        exporters:
          - debug
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
      metrics:
        exporters:
          - debug
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - prometheus
      traces:
        exporters:
          - debug
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - jaeger
          - zipkin
.
.
ports:
  otlp:
  metrics:
    enabled: true
    containerPort: 8888
    servicePort: 8888
    protocol: TCP

Network connectivity between airflow containers and opentelemetry deployment is also OK .

I could use your help.

Any suggestions ?

Any thought?

dthauvin avatar Mar 18 '24 20:03 dthauvin

I have the same problem

astanishevskyi-gl avatar Mar 22 '24 11:03 astanishevskyi-gl

I am no familiar with Airflow, is it the source of metrics and is it sending over OTLP?

If you done want the otel default metrics remove the prometheus receiver.

TylerHelmuth avatar Mar 22 '24 16:03 TylerHelmuth

Stills relevant

seifrajhi avatar Sep 27 '24 13:09 seifrajhi

anyone got otel working with airflow? If so can you advise how what is the magic? Wasted too much time on this already

jiribroulik avatar Oct 10 '24 17:10 jiribroulik

@jiribroulik i manage and succeed something , with disabling StatsD .

  metrics:
    otel_on: True
    otel_host: localhost
    otel_interval_milliseconds: 30000
    otel_port: 4318
    otel_prefix: myprefix
    otel_ssl_active: False
statsd: 
  enabled: False

dthauvin avatar Oct 14 '24 15:10 dthauvin

@dthauvin - Can you share your airflow.cfg [metrics], airflow helm values and Otel-Collector Helm values yaml? I have tried with Otel-Collector contrib and k8s image 0.108.0 and 0.111.0 - enabling connection on '0.0.0.0' as well as '${env:MY_POD_IP}:port' in helm values yaml. In both of these approaches I'm getting connection refused errors. Otel container is not allowing the connection to get established and hence even though airflow web server tries several times to send metrics info it's unable to reach to collector.

neelshah1617 avatar Oct 20 '24 00:10 neelshah1617

@dthauvin - Can you share your airflow.cfg [metrics], airflow helm values and Otel-Collector Helm values yaml? I have tried with Otel-Collector contrib and k8s image 0.108.0 and 0.111.0 - enabling connection on '0.0.0.0' as well as '${env:MY_POD_IP}:port' in helm values yaml. In both of these approaches I'm getting connection refused errors. Otel container is not allowing the connection to get established and hence even though airflow web server tries several times to send metrics info it's unable to reach to collector.

In my experience, majority of the connection related error were fixed by changing the endpoint from '${env:MY_POD_IP}:port' to 0.0.0.0:4318 (otlp http receiver port, usually), and then making sure the port is exposed via service, so that things running outside can send it through the port mentioned.

howardyoo avatar Oct 22 '24 03:10 howardyoo

@jiribroulik i manage and succeed something , with disabling StatsD .

  metrics:
    otel_on: True
    otel_host: localhost
    otel_interval_milliseconds: 30000
    otel_port: 4318
    otel_prefix: myprefix
    otel_ssl_active: False
statsd: 
  enabled: False

So, @dthauvin , are you saying when you are using otel metrics, you also have to explicitly disable statsd part?

howardyoo avatar Oct 22 '24 04:10 howardyoo

OTEL config:

receivers: otlp: protocols: grpc: http: endpoint: "0.0.0.0:4318" # tls: # cert_file: "/opt/airflow/ssl/certificate.crt" # key_file: "/opt/airflow/ssl/private.key" processors: batch:

exporters: prometheus: endpoint: "0.0.0.0:8889" namespace: airflow const_labels: instance: airflow send_timestamps: true metric_expiration: 180m resource_to_telemetry_conversion: enabled: true
logging: verbosity: detailed

service: pipelines: metrics: receivers: [otlp] processors: [batch] exporters: [prometheus] logs: receivers: [otlp] exporters: [logging]

docker-compose file:

OpenTelemetry Collector configuration

airflow-otel-collector: container_name: airflow-otel-collector image: otel/opentelemetry-collector-contrib command: ["--config=/etc/otelcol-contrib/airflow-otel-config.yaml"] volumes: - ./airflow-otel-config.yaml:/etc/otelcol-contrib/airflow-otel-config.yaml - ./ssl:/opt/airflow/ssl/ ports: - 8888:8888 # Prometheus metrics exposed by the Collector - 8889:8889 # Prometheus exporter metrics - 13133:13133 # health_check extension - 4317:4317 # OTLP gRPC receiver - 4318:4318 # OTLP http receiver healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:13133/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always

Airflow Environment Variables: AIRFLOW__METRICS__OTEL_ON: 'true' AIRFLOW__METRICS__OTEL_HOST: 'airflow-otel-collector' AIRFLOW__METRICS__OTEL_PORT: '4318' AIRFLOW__METRICS__OTEL_INTERVAL_MILLISECONDS: '30000'

works like a charm.

 make sure you explicitly give the container a name and put it as a otel-host since it was giving me the same issue.

amanpreetbatra avatar Nov 04 '24 18:11 amanpreetbatra