airflow icon indicating copy to clipboard operation
airflow copied to clipboard

DagFileProcessor produces invalid metric tags

Open sungwy opened this issue 2 years ago • 9 comments

Apache Airflow version

2.6.0b1

What happened

The recently added dag_processing.processes file_path metric tag always fails to publish the metric tag because file path delimiter '/' is not a valid character according to the stat_name_default_handler

airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=finish) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
[2023-04-18T12:21:39.738-0400] {stats.py:245} ERROR - Invalid stat name: dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=start.
Traceback (most recent call last):
File "/mnt/c/Users/user/Documents/GitHub/airflow-dir/venv/lib/python3.9/site-packages/airflow/stats.py", line 242, in wrapper
stat = handler_stat_name_func(stat)
File "/mnt/c/Users/user/Documents/GitHub/airflow-dir/venv/lib/python3.9/site-packages/airflow/stats.py", line 210, in stat_name_default_handler
raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=start) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
[2023-04-18T12:21:51.375-0400] {stats.py:245} ERROR - Invalid stat name: dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=finish.

What you think should happen instead

Although it is not a fatal error it feels erroneous that the default stats name handler is not able to support the metric tag out of the box.

We do have the following parameters that allows a user to get around this issue:

  1. stat_name_handler
  2. statsd_disabled_tags

But, I would like to advocate that we include '/' as a supported character to stat_name_default_handler, or sanitize the file_path value to use a supported character instead. It would feel more intuitive for a new user using the feature to have metric tags work correctly with the default configurations, rather than needing to implement their own stat_name_handler to work around the issue.

Examples Metrics: https://github.com/apache/airflow/blob/main/airflow/dag_processing/manager.py#L998 https://github.com/apache/airflow/blob/main/airflow/dag_processing/processor.py#L767

How to reproduce

Enable stats with:

[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = 
statsd_influxdb_enabled = True

Operating System

Red Hat Enterprise Linux Server 7.6 (Maipo)

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

sungwy avatar Apr 18 '23 17:04 sungwy

Can I take this one?

Gowthami03B avatar Apr 19 '23 13:04 Gowthami03B

Sure

potiuk avatar Apr 19 '23 16:04 potiuk

This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.

github-actions[bot] avatar Apr 30 '24 07:04 github-actions[bot]

@eladkal @potiuk similar error in Airflow 2.8.2 dag-processor pod/container

 {dag_processor_job_runner.py:60} INFO - Starting the Dag Processor Job
[2024-06-18T02:50:07.781+0000] {validators.py:101} ERROR - Invalid stat name: dag_processing.last_duration.random error 2-0424133757V
/python3.8/site-packages/airflow/metrics/validators.py", line 185, in stat_name_default_handler
    raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.last_run.seconds_ago.random error 2-0424133757V5) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.

shalberd avatar Jun 21 '24 16:06 shalberd

Same error in Airflow 2.9.1

ERROR - Invalid stat name: dag_processing.processes,file_path=/opt/airflow/dags/live/sha/appconf/contracts/contracts.py,action=start.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 134, in wrapper
    stat = handler_stat_name_func(stat)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 221, in stat_name_default_handler
    raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/opt/airflow/dags/live/sha/appconf/contracts/contracts.py,action=start) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.

ares-b avatar Jul 11 '24 08:07 ares-b

Yes. It still waits for someone who will investigate and fix it. Can be anyone - even those who experience it (actually it would be best as they could easily test if it's fixed).

potiuk avatar Jul 11 '24 08:07 potiuk

I could help investigate it, @Lee-W could you help assign it to me, thanks!

josix avatar Aug 13 '24 07:08 josix

Sure thing 🙂

Lee-W avatar Aug 13 '24 07:08 Lee-W

Same error in 2.10.0 :(.

Jeoffreybauvin avatar Aug 26 '24 14:08 Jeoffreybauvin

Can we not allow / character in the validator? I've applied this to a local instance which addresses the issue and I can see the metrics as expected in prometheus

awdavidson avatar Oct 11 '24 09:10 awdavidson