airflow
airflow copied to clipboard
Airflow statsd stops sending metrics during maximum dagrun
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.3
What happened?
Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing. No logs were found in the statsd pod and no spike in cpu or memory is found in the statsd pod.
What you think should happen instead?
The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.
How to reproduce
Run tasks more than 200 in parallel in multiple dags
Operating System
Amazon Linux 2
Versions of Apache Airflow Providers
pytest>=6.2.5 docker>=5.0.0 crypto>=1.4.1 cryptography>=3.4.7 pyOpenSSL>=20.0.1 ndg-httpsclient>=0.5.1 boto3>=1.34.0 sqlalchemy redis>=3.5.3 requests>=2.26.0 pysftp>=0.2.9 werkzeug>=1.0.1 apache-airflow-providers-cncf-kubernetes==8.0.0 apache-airflow-providers-amazon>=8.13.0 psycopg2>=2.8.5 grpcio>=1.37.1 grpcio-tools>=1.37.1 protobuf>=3.15.8,<=3.21 python-dateutil>=2.8.2 jira>=3.1.1 confluent_kafka>=1.7.0 pyarrow>=10.0.1,<10.1.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
Official helm chart deployment.
Anything else?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Can you confirm if this is happening consistently when you run 200+ tasks in parallel
@rawwar , yes I can confirm that the issue occurs intermittently while we run 250+ tasks in parallel
Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing
The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.
All signs here that this issue with statsd and not Apache Airflow itself