airflow
airflow copied to clipboard
DagFileProcessor produces invalid metric tags
Apache Airflow version
2.6.0b1
What happened
The recently added dag_processing.processes file_path metric tag always fails to publish the metric tag because file path delimiter '/' is not a valid character according to the stat_name_default_handler
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=finish) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
[2023-04-18T12:21:39.738-0400] {stats.py:245} ERROR - Invalid stat name: dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=start.
Traceback (most recent call last):
File "/mnt/c/Users/user/Documents/GitHub/airflow-dir/venv/lib/python3.9/site-packages/airflow/stats.py", line 242, in wrapper
stat = handler_stat_name_func(stat)
File "/mnt/c/Users/user/Documents/GitHub/airflow-dir/venv/lib/python3.9/site-packages/airflow/stats.py", line 210, in stat_name_default_handler
raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=start) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
[2023-04-18T12:21:51.375-0400] {stats.py:245} ERROR - Invalid stat name: dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=finish.
What you think should happen instead
Although it is not a fatal error it feels erroneous that the default stats name handler is not able to support the metric tag out of the box.
We do have the following parameters that allows a user to get around this issue:
- stat_name_handler
- statsd_disabled_tags
But, I would like to advocate that we include '/' as a supported character to stat_name_default_handler, or sanitize the file_path value to use a supported character instead. It would feel more intuitive for a new user using the feature to have metric tags work correctly with the default configurations, rather than needing to implement their own stat_name_handler to work around the issue.
Examples Metrics: https://github.com/apache/airflow/blob/main/airflow/dag_processing/manager.py#L998 https://github.com/apache/airflow/blob/main/airflow/dag_processing/processor.py#L767
How to reproduce
Enable stats with:
[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix =
statsd_influxdb_enabled = True
Operating System
Red Hat Enterprise Linux Server 7.6 (Maipo)
Versions of Apache Airflow Providers
No response
Deployment
Virtualenv installation
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Can I take this one?
Sure
This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.
@eladkal @potiuk similar error in Airflow 2.8.2 dag-processor pod/container
{dag_processor_job_runner.py:60} INFO - Starting the Dag Processor Job
[2024-06-18T02:50:07.781+0000] {validators.py:101} ERROR - Invalid stat name: dag_processing.last_duration.random error 2-0424133757V
/python3.8/site-packages/airflow/metrics/validators.py", line 185, in stat_name_default_handler
raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.last_run.seconds_ago.random error 2-0424133757V5) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
Same error in Airflow 2.9.1
ERROR - Invalid stat name: dag_processing.processes,file_path=/opt/airflow/dags/live/sha/appconf/contracts/contracts.py,action=start.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 134, in wrapper
stat = handler_stat_name_func(stat)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 221, in stat_name_default_handler
raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/opt/airflow/dags/live/sha/appconf/contracts/contracts.py,action=start) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
Yes. It still waits for someone who will investigate and fix it. Can be anyone - even those who experience it (actually it would be best as they could easily test if it's fixed).
I could help investigate it, @Lee-W could you help assign it to me, thanks!
Sure thing 🙂
Same error in 2.10.0 :(.
Can we not allow / character in the validator? I've applied this to a local instance which addresses the issue and I can see the metrics as expected in prometheus