Some pool and executor slot/task statsd metrics are always 0
Apache Airflow version
2.6.0
What happened
When queuing up a number of DAGs on a basic Airflow installation (SQLite with SequentialExecutor), a number of statd metrics were only reporting 0 values. This list included:
pool.queued_slots.<pool_name> pool.running_slots.<pool_name> executor.running_tasks
What you think should happen instead
I would expect these metrics to show non 0 values at some point. The webservice UI for the default pool shows running and queued tasks as non 0 values for a few minutes while the queued DAGs are processed.
How to reproduce
I have a fairly simple test DAG
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2023, 5, 2),
'retries': 1,
'retry_delay': timedelta(minutes=1),
'sla': timedelta(minutes=1),
}
dag = DAG(
'longtask_dag',
default_args=default_args,
description='Example DAG with a task that takes about 2 minutes to execute',
schedule_interval='*/5 * * * *',
)
t1 = BashOperator(
task_id='long_running_task',
bash_command='sleep 120',
dag=dag
)
t2 = BashOperator(
task_id='short_running_task',
bash_command='echo "Short running task"',
dag=dag
)
t2 >> t1
I then queue up about 20 runs of this DAG in the webservice, and start to monitor the statsd output from Airflow (after configuring it).
Operating System
macOS Monterey v12.3.1
Versions of Apache Airflow Providers
apache-airflow-providers-common-sql==1.4.0 apache-airflow-providers-ftp==3.3.1 apache-airflow-providers-http==4.3.0 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-sqlite==3.3.2
Deployment
Virtualenv installation
Deployment details
I for the most part followed the part of the guide to install and run airflow here:
https://www.redhat.com/en/blog/monitoring-apache-airflow-using-prometheus
Anything else
This problem seems to be consistently reproducible.
There are also some related metrics that I am seeing similar issues with that I have not made an issue for yet:
pool.open_slots.<pool_name> executor.open_slots
These always seem to report the max number of slots (for either pool or executor), even when there are tasks that are running.
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Yes. it would be great for someone to track the exact list and remove those which are 0 - some of them were removed or discussed being removed. Would you want to lead it and implement it @StefanKurek ? If not, I will mark it as "good first issue" amd hopefully someone will.
Hey there @potiuk, I'd love to contribute as a beginner. Can you please assign this to me and also help me as to where I can start to tackle this issue from?
Sorry for such a long response to this. I almost forgot about it because, I only noticed this issue when installing in the way that I mentioned. Once I installed using docker, then I no longer saw this issue FYI
Sorry for such a long response to this. I almost forgot about it because, I only noticed this issue when installing in the way that I mentioned. Once I installed using docker, then I no longer saw this issue FYI
Ah alright, thank you for letting me know :)
I hope you will be able to reproduce it @ItIsOHM :) assigned you
(or even if not - then we might be able to close it if you confirm it's not really reproducible easily)
(or even if not - then we might be able to close it if you confirm it's not really reproducible easily)
Hahaah, i'd like to atleast try and fix this if it's good for a beginner like me :D Any help would be appreciated!
this issue still persist. I'm able to get metric data for executor.running_tasks but not executor.queued_tasks, it's always 0.