airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Metrics - Possible race condition?

Open ferruzzi opened this issue 2 years ago • 2 comments

Apache Airflow version

2.6.2

What happened

There are a handful of "end of action" metrics which are emitted in StatsD but not getting emitted in OTel. I believe the solution is to add flush helper method to the SafeOtelLogger which calls MetricsMeter's force_flush then roughly to add something like

if hasattr(Stats, "force_flush"):
    Stats.meter.force_flush()

where the TaskInstance, DagRun, etc are exiting in order to force those metrics to be emitted rather than waiting for the next OTel scheduled pass to collect them. This theory is not yet tested and may be wrong.

I am submitting this as an Issue since I will be a little distracted for the next bit and figured someone may be able to have a look in the meantime. Please do not assign it to me, I'll get it when I can is nobody else does.

What you think should happen instead

Behavior should be consistent.

How to reproduce

To reproduce, you can run Breeze with the statsd or the otel integration (for example breeze start-airflow --integration otel) and run the following DAG, then open the OTel or StatsD raw data view to verify.

from airflow import DAG
from airflow.decorators import task
from airflow.utils.timezone import datetime


@task
def task1():
    return 'Hello'


@task
def task2():
    return 'World!'

@task
def task3(in1, in2):
    print(f'{in1} {in2}')


with DAG(
    dag_id='taskflow_demo',
    start_date=datetime(2021, 1, 1),
    schedule=None,
    catchup=False
) as dag:

    task3(task1(), task2())

You will find the following counters are visible in the StatsD logs but not in OTel:

  • airflow_<job_name>_end
  • airflow_operator_failures_<operator_name>
  • airflow_operator_successes_<operator_name>
  • airflow_ti_failures
  • airflow_ti_successes

This one may also be related possibly: https://github.com/apache/airflow/issues/32162

Operating System

ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

ferruzzi avatar Jun 26 '23 22:06 ferruzzi

This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.

github-actions[bot] avatar Jun 30 '24 07:06 github-actions[bot]

not stale

potiuk avatar Jun 30 '24 08:06 potiuk