airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Airflow statsd stops sending metrics during maximum dagrun

Open paramjeet01 opened this issue 1 year ago • 1 comments
trafficstars

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.3

What happened?

Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing. No logs were found in the statsd pod and no spike in cpu or memory is found in the statsd pod.

What you think should happen instead?

The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.

How to reproduce

Run tasks more than 200 in parallel in multiple dags

Operating System

Amazon Linux 2

Versions of Apache Airflow Providers

pytest>=6.2.5 docker>=5.0.0 crypto>=1.4.1 cryptography>=3.4.7 pyOpenSSL>=20.0.1 ndg-httpsclient>=0.5.1 boto3>=1.34.0 sqlalchemy redis>=3.5.3 requests>=2.26.0 pysftp>=0.2.9 werkzeug>=1.0.1 apache-airflow-providers-cncf-kubernetes==8.0.0 apache-airflow-providers-amazon>=8.13.0 psycopg2>=2.8.5 grpcio>=1.37.1 grpcio-tools>=1.37.1 protobuf>=3.15.8,<=3.21 python-dateutil>=2.8.2 jira>=3.1.1 confluent_kafka>=1.7.0 pyarrow>=10.0.1,<10.1.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Official helm chart deployment.

Anything else?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

paramjeet01 avatar May 11 '24 21:05 paramjeet01

Can you confirm if this is happening consistently when you run 200+ tasks in parallel

rawwar avatar May 12 '24 00:05 rawwar

@rawwar , yes I can confirm that the issue occurs intermittently while we run 250+ tasks in parallel

paramjeet01 avatar May 15 '24 05:05 paramjeet01

Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing

The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.

All signs here that this issue with statsd and not Apache Airflow itself

Taragolis avatar May 16 '24 10:05 Taragolis