airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Some pool and executor slot/task statsd metrics are always 0

Open StefanKurek opened this issue 2 years ago • 9 comments

Apache Airflow version

2.6.0

What happened

When queuing up a number of DAGs on a basic Airflow installation (SQLite with SequentialExecutor), a number of statd metrics were only reporting 0 values. This list included:

pool.queued_slots.<pool_name> pool.running_slots.<pool_name> executor.running_tasks

What you think should happen instead

I would expect these metrics to show non 0 values at some point. The webservice UI for the default pool shows running and queued tasks as non 0 values for a few minutes while the queued DAGs are processed.

How to reproduce

I have a fairly simple test DAG

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2023, 5, 2),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
    'sla': timedelta(minutes=1),
}

dag = DAG(
    'longtask_dag',
    default_args=default_args,
    description='Example DAG with a task that takes about 2 minutes to execute',
    schedule_interval='*/5 * * * *',
)

t1 = BashOperator(
    task_id='long_running_task',
    bash_command='sleep 120',
    dag=dag
)

t2 = BashOperator(
    task_id='short_running_task',
    bash_command='echo "Short running task"',
    dag=dag
)

t2 >> t1

I then queue up about 20 runs of this DAG in the webservice, and start to monitor the statsd output from Airflow (after configuring it).

Operating System

macOS Monterey v12.3.1

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.4.0 apache-airflow-providers-ftp==3.3.1 apache-airflow-providers-http==4.3.0 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-sqlite==3.3.2

Deployment

Virtualenv installation

Deployment details

I for the most part followed the part of the guide to install and run airflow here:

https://www.redhat.com/en/blog/monitoring-apache-airflow-using-prometheus

Anything else

This problem seems to be consistently reproducible.

There are also some related metrics that I am seeing similar issues with that I have not made an issue for yet:

pool.open_slots.<pool_name> executor.open_slots

These always seem to report the max number of slots (for either pool or executor), even when there are tasks that are running.

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

StefanKurek avatar May 08 '23 14:05 StefanKurek

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar May 08 '23 14:05 boring-cyborg[bot]

Yes. it would be great for someone to track the exact list and remove those which are 0 - some of them were removed or discussed being removed. Would you want to lead it and implement it @StefanKurek ? If not, I will mark it as "good first issue" amd hopefully someone will.

potiuk avatar May 13 '23 19:05 potiuk

Hey there @potiuk, I'd love to contribute as a beginner. Can you please assign this to me and also help me as to where I can start to tackle this issue from?

ItIsOHM avatar Jun 28 '23 17:06 ItIsOHM

Sorry for such a long response to this. I almost forgot about it because, I only noticed this issue when installing in the way that I mentioned. Once I installed using docker, then I no longer saw this issue FYI

StefanKurek avatar Jun 28 '23 17:06 StefanKurek

Sorry for such a long response to this. I almost forgot about it because, I only noticed this issue when installing in the way that I mentioned. Once I installed using docker, then I no longer saw this issue FYI

Ah alright, thank you for letting me know :)

ItIsOHM avatar Jun 28 '23 18:06 ItIsOHM

I hope you will be able to reproduce it @ItIsOHM :) assigned you

potiuk avatar Jun 28 '23 18:06 potiuk

(or even if not - then we might be able to close it if you confirm it's not really reproducible easily)

potiuk avatar Jun 28 '23 18:06 potiuk

(or even if not - then we might be able to close it if you confirm it's not really reproducible easily)

Hahaah, i'd like to atleast try and fix this if it's good for a beginner like me :D Any help would be appreciated!

ItIsOHM avatar Jun 28 '23 18:06 ItIsOHM

this issue still persist. I'm able to get metric data for executor.running_tasks but not executor.queued_tasks, it's always 0.

cjj1120 avatar May 08 '24 09:05 cjj1120