airflow icon indicating copy to clipboard operation
airflow copied to clipboard

DAG disappearing from Airflow in case of standalone DAG processor

Open kosteev opened this issue 1 year ago • 9 comments
trafficstars

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.7.3

What happened?

If DAG processor is running in a standalone mode with defined --num-runs flag

airflow dag-processor --num-runs 100

Some DAGs disappear from Airflow until DAG processor will run again.

What you think should happen instead?

DAGs shouldn't disappear never.

How to reproduce

  • run Airflow 2.7.3 with standalone DAG processor (separate from scheduler) with --num-runs flag Example:
airflow dag-processor --num-runs 100
  • deploy some DAGs
  • after some time, ~20 mins (once DAG processor will parse all DAG files ~50), deploy another DAG (new DAG file)
  • after some time observe that DAGs will start to disappear

The issue is, basically, that:

  • DAG processor runs until it parses each file exactly num_runs times https://github.com/apache/airflow/blob/2872d370178ffed402ac15d2983dbeda8ed7902b/airflow/dag_processing/manager.py#L1245
  • if a new DAG file will be added to the DAG bag during DAG processor running, this new file will have different number of parsing runs than others already existing there
  • at some point other DAGs (those which were already in the DAG bag) will stopped to be parsed due to hitting run limit (they will be excluded from parsing loop), see "Last Run" column in the screenshot below https://github.com/apache/airflow/blob/2872d370178ffed402ac15d2983dbeda8ed7902b/airflow/dag_processing/manager.py#L1164
  • DAG processor will continue to parse DAGs until each file will be parsed exactly num_runs times
  • Due to the fact that DAGs are not parsed for some time, they will be removed from Airflow DAG bag

image

Operating System

Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Other

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

kosteev avatar Apr 30 '24 15:04 kosteev

If "Other Airflow 2 version" selected, which one? 2.7.3

Is it reproduced in latest Airflow (2.9.0 at that moment)?

Taragolis avatar Apr 30 '24 16:04 Taragolis

Is it reproduced in latest Airflow (2.9.0 at that moment)?

I do not think there was a change in this area since. But it looks plausible and likely not very difficult to fix - so I will mark it as good-first-issue to investigate and fix.

potiuk avatar May 07 '24 11:05 potiuk

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] avatar May 22 '24 00:05 github-actions[bot]

I didn't test it in Airflow 2.9, but the implementation of the DAG processor at this place of code should be the same, so the issue has to be there as well.

kosteev avatar May 22 '24 07:05 kosteev

@kosteev -> maybe a chance to contribute a fix yourself ?

potiuk avatar May 28 '24 09:05 potiuk

Hey, is it okay if I get assigned to this issue? Thanks!

andyjianzhou avatar Jun 05 '24 17:06 andyjianzhou

I found same issue in 2.9.1 .. Had to revert back to traditional schedulers :(

bmoon4 avatar Jun 06 '24 15:06 bmoon4

Hey all! So I have been trying to reproduce the error and haven't been successful.

I've ran the DagProcessor Job standalone, and created 40+ test dag files, some of them having a timeout functionality within them.

From my understanding, this is my workflow: I run the command -> I can view all my dags on the airflow UI. -> I add a new Dag file -> wait 5 minutes -> Refresh my page and all dag files exists. If there's a timeout error, there will be an explanation in terminal and/or on UI about an invalid DAG file or a runtime error.

Can you please provide more information regarding steps to reproduce the issue? Thank you!

andyjianzhou avatar Jun 12 '24 21:06 andyjianzhou

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] avatar Jun 27 '24 00:06 github-actions[bot]

Hi facing the same issue in airflow 2.7.3 and 2.9.1 both. these logs are from 2.7.3 we are getting this in dag-processer pod logs : DAG airflow_dag1 is missing and will be deactivated.[2024-06-28T09:03:57.806+0000] {manager.py:543} INFO - DAG airflow_dag2 is missing and will be deactivated.[2024-06-28T09:03:57.809+0000] {manager.py:553} INFO - Deactivated 2 DAGs which are no longer present in file.[2024-06-28T09:03:57.816+0000] {manager.py:557} INFO - Deleted DAG airflow_dag1 in serialized_dag table[2024-06-28T09:03:57.823+0000] {manager.py:557} INFO - Deleted DAG airflow_dag2 in serialized_dag

some times we are getting this in scheduler logs: [2024-07-02T08:02:45.975+0000] {scheduler_job_runner.py:1786} INFO - Found (3) stales dags not parsed after 2024-07-02 07:52:45.971946+00:00. the dags are disappearing from airflow ui and serialized_dag table and after some time they are appearing again

could any once help me with this issue

vghar-bh avatar Jul 02 '24 08:07 vghar-bh

as explained in slack - likely your DAG code is producing different outputs (different DAGs) at different times it is parsed. look for bug in your DAG code.

potiuk avatar Jul 03 '24 08:07 potiuk

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] avatar Aug 22 '24 00:08 github-actions[bot]

This issue has been closed because it has not received response from the issue author.

github-actions[bot] avatar Aug 29 '24 00:08 github-actions[bot]

I have the problem

DAG disappearing from Airflow in case of standalone DAG processor but without configuring a custom num-runs

with 2.10.1 and only deterministic dags ( not using any dynamic config for the dag authoring )

and

airflow_dag_processing_total_parse_time is between 0.1 and 1.2s

airflow_dag_processing_file_path_queue_size is between 0 and 5

I had zero problem without the dagprocessor and 2 scheduler

raphaelauv avatar Sep 09 '24 15:09 raphaelauv