Metrics missing while using statsd
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.7.1
What happened?
I have airflow with statsd enabled (airflow > statsd-exporter > prometheus) and I believe that airflow is not sending all metrics defined in apache-airflow/2.7.1/metrics. I've added dag.<dag_id>.<task_id>.scheduled_duration and dag.<dag_id>.<task_id>.queued_duration to my statsd mappings file:
- match: "*.dag.*.*.scheduled_duration"
match_metric_type: observer
name: "af_agg_dag_task_scheduled_duration"
labels:
airflow_id: "$1"
dag_id: "$2"
task_id: "$3"
- match: "*.dag.*.*.queued_duration"
match_metric_type: observer
name: "af_agg_dag_task_queued_duration"
labels:
airflow_id: "$1"
dag_id: "$2"
task_id: "$3"
And could not find it in prometheus, so I checked under /metrics on statsd-exporter and did not find either. Later found out that more are missing, eg dag.<dag_id>.<task_id>.duration (I don't know if it's a coincidence but metrics with one or two labels, including airflow_id, works fine, but with more does not). Even when removing mapping entirely those metrics with default names are missing. There are not any logs related to those metrics either (with log level = debug) in statsd-exporter.
What you think should happen instead?
Metrics should be available in statsd-exporter like all the others.
How to reproduce
Enable statsd metrics on Airflow 2.7.1, then connect Airflow with statsd-exporter (0.26.0) and check /metrics
Operating System
Ubuntu 22.04.3 LTS
Versions of Apache Airflow Providers
Default installation from pypi - https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html
Deployment
Virtualenv installation
Deployment details
Default installation from pypi - https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html
Anything else?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
I have similar behaviour on some 2.4.3 instances with 20+ days uptime. After scheduler restart it seems ok (not sure how long it will last yet). I'am focused on .dagrun.duration.failed and its missing sometimes.
Scheduler code seems to be working fine.
First i checked that sheduler processes are durable for statsd network outages - is ok. Metrics are not dissapeared after 5 min network failure.
@AutomationDev85 is this the same problem you also reported to me personally?
nvm, it was our fault (facepalm) we had statsd-exporter as container in the same pod as scheduler and sending metrics on localhost so we had metrics from scheduler only. It must be Service accessible from executors also..