airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Automatically register DAGs that are used in a context manager

Open ashb opened this issue 3 years ago • 6 comments

This has caused a bit of confusion to committers (missing off the as dag) and is just more user friendly

This is on by default but can be disabled by passing auto_register=False do a DAG

ashb avatar May 09 '22 16:05 ashb

This works for the current example sub-dags we have in the tests -- is there any other case I should test?

ashb avatar May 09 '22 16:05 ashb

@ashb there are at least two things to consider here which I stumbled upon playing around with this:

  1. There's nothing special about with except that tasks are automatically assigned to this dag (i.e., dag=dag). Tying registration of DAGs to with is perhaps a bit confusing in this sense – also, you can use with on a particular dag instance multiple times!

  2. What about DAGs declared in functions?

    def test():
        with DAG():
            ...
    test()
    

    I don't mind but it is a little magic perhaps, that this automatically declares the DAG.

I ended up instead with this pattern:

dag = DAG(...)

with dag:
    ...

– as a recommended pattern, simply because then you can't forget to give the DAG instance a top-level variable.

And then as I mention in https://lists.apache.org/thread/xndt3zklxbzo5nlmjz10dtfm2pp4t4wq, we could instead warn the user if they'd created a DAG which isn't exposed/registered.

(The gist of that linked reply is that there are two different ways to do that.)

malthe avatar May 09 '22 16:05 malthe

Tying registration of DAGs to with is perhaps a bit confusing in this sense – also, you can use with on a particular dag instance multiple times!

@malthe Is it the param name that is confusing? I do not understand I follow what you are saying. The use-case for this PR is that we don't need to needlessly do a with DAG(..) as dag as there is no need to have the as dag in it unless someone wants to actually use the dag variable itself

What about DAGs declared in functions?

Isn't it the same behaviour as the one with with DAG(..) as dag ?

kaxil avatar May 09 '22 17:05 kaxil

@kaxil I understand the use-case – what I am somewhat hesitant to vote for is the conflation of the two orthogonal concerns which are both suggested for with here:

  1. As a scope for tasks created which automatically get the dag parameter set;
  2. As a mechanism to automatically register a DAG with the loading mechanism.

I think you could argue then that any DAG instance created during the loading of a module should be automatically registered – after all, why would a user create a DAG instance only to throw it away?

The problem with that is that we do have a test case that prevents this:

def test():
    dag = DAG(...)
    # Add tasks to dag and forget returning the DAG instance

test()

# or dag = test(), same result because `test` has no return value

The point is, should we require the use of with or should auto_register work regardless?

malthe avatar May 09 '22 19:05 malthe

We could extend the auto-register -- but that makes it slightly more complex for SubDag (we'd have to "unregister" a dag when it gets turned in to a sub dag) but we could do it just as easily if we want to.

Edit: actually I guess that with subdag: is just as likely to be used as with dag

ashb avatar May 17 '22 16:05 ashb

I still need to update the docs/tutorial etc to reflect this.

ashb avatar Aug 05 '22 08:08 ashb

Reminder to self: finish this before 2.4

ashb avatar Aug 12 '22 09:08 ashb