airflow
airflow copied to clipboard
Statsd metric name components are unsanitized
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
[2.2.5] I recently enabled statsd metric emission on my team's deploy and began writing views on the result.
I've found that:
- some dag authors had included a period in their task_id name - like
task_id = "load.all"
- this string is not sanitized of statsd-sensitive characters (like '.' and '|') prior to the building of the Metric name
Result: metric names with too many .-delimiters and grafana views which don't render these task metrics.
What you think should happen instead
Any instance of Statsd.foo(f'{dag_id}...')
or Statsd.foo(f'{task_id}...')
should probably first sanitize those strings of statsd-sensitive characters.
So instead of
Current: Statsd.timer("dag.{self.task.dag_id}.{self.task.task_id}.duration")
-> "dag.DAG.WITH.PERIODS.IN_NAME.TASK.WITH.PERIODS.IN.NAME.duration"
Recommended: Stats.timer(f"dag.{statsd_sanitize(self.task.dag_id)}.{statsd_sanitize(self.task.task_id)}.duration")
-> "dag.DAG_WITH_PERIODS_IN_NAME.TASK_WITH_PERIODS_IN_NAME.duration"
How to reproduce
- create a dag
- write a task in that dag with task_id which contains a period. Like: "foo.bar"
- enable statsd metrics collection
- run your dag
- struggle to deal with poorly-formatted metric names like
"dag.DAG.WITH.PERIODS.IN.NAME.TASK.WITH.PERIODS.IN.NAME.duration"
Operating System
debian
Versions of Apache Airflow Providers
apache-airflow-providers-apache-hive==2.3.1
apache-airflow-providers-apache-spark==2.1.2
apache-airflow-providers-celery==2.1.3
apache-airflow-providers-cncf-kubernetes==3.0.0
apache-airflow-providers-docker==2.5.2
apache-airflow-providers-elasticsearch==2.2.0
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-google==6.7.0
apache-airflow-providers-grpc==2.0.4
apache-airflow-providers-hashicorp==2.1.4
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-microsoft-azure==3.7.2
apache-airflow-providers-mysql==2.2.3
apache-airflow-providers-odbc==2.0.4
apache-airflow-providers-postgres==4.1.0
apache-airflow-providers-redis==2.0.4
apache-airflow-providers-sendgrid==2.0.4
apache-airflow-providers-sftp==2.5.2
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-sqlite==2.1.3
apache-airflow-providers-ssh==2.4.3
Deployment
Official Apache Airflow Helm Chart
Deployment details
Deployed with v8.6.1 of https://github.com/airflow-helm/charts/tree/main/charts/airflow
We wired up a pod to convert statsd->prom using https://github.com/prometheus/statsd_exporter, and use Matcher rules by glob/regex to convert statsd-style metrics into prom-style.
Anything else
No response
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template!
This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.
This issue has been closed because it has not received response from the issue author.