airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Metrics missing while using statsd

Open htpawel opened this issue 1 year ago • 2 comments

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.7.1

What happened?

I have airflow with statsd enabled (airflow > statsd-exporter > prometheus) and I believe that airflow is not sending all metrics defined in apache-airflow/2.7.1/metrics. I've added dag.<dag_id>.<task_id>.scheduled_duration and dag.<dag_id>.<task_id>.queued_duration to my statsd mappings file:

  - match: "*.dag.*.*.scheduled_duration"
    match_metric_type: observer
    name: "af_agg_dag_task_scheduled_duration"
    labels:
      airflow_id: "$1"
      dag_id: "$2"
      task_id: "$3"
  - match: "*.dag.*.*.queued_duration"
    match_metric_type: observer
    name: "af_agg_dag_task_queued_duration"
    labels:
      airflow_id: "$1"
      dag_id: "$2"
      task_id: "$3"

And could not find it in prometheus, so I checked under /metrics on statsd-exporter and did not find either. Later found out that more are missing, eg dag.<dag_id>.<task_id>.duration (I don't know if it's a coincidence but metrics with one or two labels, including airflow_id, works fine, but with more does not). Even when removing mapping entirely those metrics with default names are missing. There are not any logs related to those metrics either (with log level = debug) in statsd-exporter.

What you think should happen instead?

Metrics should be available in statsd-exporter like all the others.

How to reproduce

Enable statsd metrics on Airflow 2.7.1, then connect Airflow with statsd-exporter (0.26.0) and check /metrics

Operating System

Ubuntu 22.04.3 LTS

Versions of Apache Airflow Providers

Default installation from pypi - https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

Deployment

Virtualenv installation

Deployment details

Default installation from pypi - https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

Anything else?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

htpawel avatar Feb 14 '24 13:02 htpawel

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Feb 14 '24 13:02 boring-cyborg[bot]

I have similar behaviour on some 2.4.3 instances with 20+ days uptime. After scheduler restart it seems ok (not sure how long it will last yet). I'am focused on .dagrun.duration.failed and its missing sometimes.

Scheduler code seems to be working fine.

First i checked that sheduler processes are durable for statsd network outages - is ok. Metrics are not dissapeared after 5 min network failure.

pvaling avatar Feb 21 '24 16:02 pvaling

@AutomationDev85 is this the same problem you also reported to me personally?

jscheffl avatar Feb 24 '24 22:02 jscheffl

nvm, it was our fault (facepalm) we had statsd-exporter as container in the same pod as scheduler and sending metrics on localhost so we had metrics from scheduler only. It must be Service accessible from executors also..

htpawel avatar Feb 29 '24 16:02 htpawel