airflow
airflow copied to clipboard
Task stuck in "scheduled" when running in backfill job
Apache Airflow version
2.2.4
What happened
We are running airflow 2.2.4 with KubernetesExecutor. I have created a dag to run airflow backfill command with SubprocessHook. What was observed is that when I started to backfill a few days' dagruns the backfill would get stuck with some dag runs having tasks staying in the "scheduled" state and never getting running.
We are using the default pool and the pool is totoally free when the tasks got stuck.
I could find some logs saying:
TaskInstance: <TaskInstance: test_dag_2.task_1 backfill__2022-03-29T00:00:00+00:00 [queued]> found in queued state but was not launched, rescheduling and nothing else in the log.
What you think should happen instead
The tasks stuck in "scheduled" should start running when there is free slot in the pool.
How to reproduce
Airflow 2.2.4 with python 3.8.13, KubernetesExecutor running in AWS EKS.
One backfill command example is: airflow dags backfill test_dag_2 -s 2022-03-01 -e 2022-03-10 --rerun-failed-tasks
The test_dag_2 dag is like:
import time
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.models.dag import dag
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
def get_execution_date(**kwargs):
ds = kwargs['ds']
print(ds)
with DAG(
'test_dag_2',
default_args=default_args,
description='Testing dag',
start_date=pendulum.datetime(2022, 4, 2, tz='UTC'),
schedule_interval="@daily", catchup=True, max_active_runs=1,
) as dag:
t1 = BashOperator(
task_id='task_1',
depends_on_past=False,
bash_command='sleep 30'
)
t2 = PythonOperator(
task_id='get_execution_date',
python_callable=get_execution_date
)
t1 >> t2
Operating System
Debian GNU/Linux
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==3.0.0 apache-airflow-providers-celery==2.1.0 apache-airflow-providers-cncf-kubernetes==3.0.2 apache-airflow-providers-docker==2.4.1 apache-airflow-providers-elasticsearch==2.2.0 apache-airflow-providers-ftp==2.0.1 apache-airflow-providers-google==6.4.0 apache-airflow-providers-grpc==2.0.1 apache-airflow-providers-hashicorp==2.1.1 apache-airflow-providers-http==2.0.3 apache-airflow-providers-imap==2.2.0 apache-airflow-providers-microsoft-azure==3.6.0 apache-airflow-providers-microsoft-mssql==2.1.0 apache-airflow-providers-odbc==2.0.1 apache-airflow-providers-postgres==3.0.0 apache-airflow-providers-redis==2.0.1 apache-airflow-providers-sendgrid==2.0.1 apache-airflow-providers-sftp==2.4.1 apache-airflow-providers-slack==4.2.0 apache-airflow-providers-snowflake==2.5.0 apache-airflow-providers-sqlite==2.1.0 apache-airflow-providers-ssh==2.4.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template!
I tried with this workaround https://github.com/apache/airflow/issues/13542#issuecomment-1011598836 and it made the stuck tasks running again.
Is there any update regarding this issue? There are many tasks that fall into the 'scheduled' state when working on the backfill 😂
Apache Airflow 2.3.4 / Kubernetes Executor
I think there were quite q few changes to backfill - so you might try 2.4.0rc1 @wookiist - and see if it solves the problem. It might not be it, but worth trying
Is there any update regarding this issue? My tasks that fall into the 'scheduled' state when working on the backfill
Apache Airflow 2.4.0 / Kubernetes Executor
@venkateshnyq550
Is there any update regarding this issue? My tasks that fall into the 'scheduled' state when working on the backfill Apache Airflow 2.4.0 / Kubernetes Executor
There will be no update to that issue any more unless the author retests it in latest version of Airlfow and confirms whether it was solved or not and provide some more evidences from the latest version.
This issue is presumed closed. And your problem might or might not be the same even if it looks similar. So if you want to have some help on your problem, you should open a new issue with all the details you can provide - reproducible path, circumstances, logs, screenshots and all evidences you can find. Ideally in the latest released version: 2.5.0 as of now. In any case any fixes will be implemented in 2.5.* so upgrading to latest version is something that you will have to do anyway to fix, so better to do it sooner to report it already based on 2.5.* - and maybe you will find that the issue is already solved there, which will save time both - you and anyone who would look at the issue.
There is no action possible to be taken by somone stating "I have the same issue" without providing any evidences, and since it is really difficult to asses if the issue is the same or not, so the only way (if you want help) is to report a new issue with the details - feel free to refer to that issue by number as likely similar issue, that might help with finding out if they are related.