airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Create a more efficient airflow dag test command that also has better local logging

Open dimberman opened this issue 3 years ago • 28 comments

The current airflow dag test command is insufficient for an iterative local workflow when developing dags. The current system relies on running a backfill job which a) is extremely slow and b) only shows scheduler-level logs, so the user would need to search for log files to see how their tasks actually ran!

This new system

Speed

With this new system, we see a notable speedup in between-task speed. The new airflow dag test command can load tasks immediately and not wait for a scheduler heartbeat or other irrelevant airflow processes.

Before:

example bash operator: 13 seconds example python operator: 26 seconds

After

example bash operator: 6 seconds example python operator: 16 seconds

Logging

Previously, the airflow dag test command would pass on scheduler-level logs. This gives essentially 0 useful information for a DAG writer, and would force them to seek out task.log files in their filesystem every time they run a test dag. To improve this, we now surface task-level logs to the command line. We also no longer have any scheduler-level logs as we are no longer using a scheduler!

Before

[2022-09-14 14:37:48,310] {dagbag.py:522} INFO - Filling up the DagBag from /files/dags
[2022-09-14 14:37:48,967] {base_executor.py:93} INFO - Adding to queue: ['<TaskInstance: example_bash_operator.runme_0 backfill__2022-01-02T00:00:00+00:00 [queued]>']
[2022-09-14 14:37:48,973] {base_executor.py:93} INFO - Adding to queue: ['<TaskInstance: example_bash_operator.runme_1 backfill__2022-01-02T00:00:00+00:00 [queued]>']
[2022-09-14 14:37:48,979] {base_executor.py:93} INFO - Adding to queue: ['<TaskInstance: example_bash_operator.runme_2 backfill__2022-01-02T00:00:00+00:00 [queued]>']
[2022-09-14 14:37:48,985] {base_executor.py:93} INFO - Adding to queue: ['<TaskInstance: example_bash_operator.also_run_this backfill__2022-01-02T00:00:00+00:00 [queued]>']
[2022-09-14 14:37:48,991] {base_executor.py:93} INFO - Adding to queue: ['<TaskInstance: example_bash_operator.this_will_skip backfill__2022-01-02T00:00:00+00:00 [queued]>']
[2022-09-14 14:37:54,045] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:37:54,045] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "example_bash_operator__runme_0__20220102" && sleep 1']
[2022-09-14 14:37:54,053] {subprocess.py:85} INFO - Output:
[2022-09-14 14:37:54,055] {subprocess.py:92} INFO - example_bash_operator__runme_0__20220102
[2022-09-14 14:37:55,061] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:37:55,117] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:37:55,117] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "example_bash_operator__runme_1__20220102" && sleep 1']
[2022-09-14 14:37:55,124] {subprocess.py:85} INFO - Output:
[2022-09-14 14:37:55,125] {subprocess.py:92} INFO - example_bash_operator__runme_1__20220102
[2022-09-14 14:37:56,131] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:37:56,186] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:37:56,186] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "example_bash_operator__runme_2__20220102" && sleep 1']
[2022-09-14 14:37:56,193] {subprocess.py:85} INFO - Output:
[2022-09-14 14:37:56,194] {subprocess.py:92} INFO - example_bash_operator__runme_2__20220102
[2022-09-14 14:37:57,199] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:37:57,257] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:37:57,257] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "ti_key=example_bash_operator__also_run_this__20220102"']
[2022-09-14 14:37:57,265] {subprocess.py:85} INFO - Output:
[2022-09-14 14:37:57,266] {subprocess.py:92} INFO - ti_key=example_bash_operator__also_run_this__20220102
[2022-09-14 14:37:57,266] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:37:57,307] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:37:57,308] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "hello world"; exit 99;']
[2022-09-14 14:37:57,316] {subprocess.py:85} INFO - Output:
[2022-09-14 14:37:57,317] {subprocess.py:92} INFO - hello world
[2022-09-14 14:37:57,317] {subprocess.py:96} INFO - Command exited with return code 99
[2022-09-14 14:37:57,340] {backfill_job.py:381} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 2 | succeeded: 4 | running: 0 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 2
[2022-09-14 14:37:57,351] {base_executor.py:93} INFO - Adding to queue: ['<TaskInstance: example_bash_operator.run_after_loop backfill__2022-01-02T00:00:00+00:00 [queued]>']
[2022-09-14 14:37:58,918] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:37:58,918] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo 1']
[2022-09-14 14:37:58,925] {subprocess.py:85} INFO - Output:
[2022-09-14 14:37:58,926] {subprocess.py:92} INFO - 1
[2022-09-14 14:37:58,927] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:37:58,945] {dagrun.py:595} INFO - Marking run <DagRun example_bash_operator @ 2022-01-02T00:00:00+00:00: backfill__2022-01-02T00:00:00+00:00, state:running, queued_at: None. externally triggered: False> successful
[2022-09-14 14:37:58,946] {dagrun.py:657} INFO - DagRun Finished: dag_id=example_bash_operator, execution_date=2022-01-02T00:00:00+00:00, run_id=backfill__2022-01-02T00:00:00+00:00, run_start_date=2022-09-14 14:37:48.865989+00:00, run_end_date=2022-09-14 14:37:58.946180+00:00, run_duration=10.080191, state=success, external_trigger=False, run_type=backfill, data_interval_start=2022-01-02T00:00:00+00:00, data_interval_end=2022-01-03T00:00:00+00:00, dag_hash=None
[2022-09-14 14:37:58,946] {backfill_job.py:381} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 5 | running: 0 | failed: 0 | skipped: 2 | deadlocked: 0 | not ready: 0
[2022-09-14 14:37:58,949] {backfill_job.py:875} INFO - Backfill done for DAG <DAG: example_bash_operator>. Exiting.

After:

root@613d334d9c9d:/opt/airflow# airflow dags test example_bash_operator  2022-01-03
[2022-09-14 14:43:45,232] {dagbag.py:522} INFO - Filling up the DagBag from /files/dags
[2022-09-14 14:43:45,997] {dag_command.py:556} INFO - dagrun id:example_bash_operator
/opt/airflow/airflow/cli/commands/dag_command.py:573 RemovedInAirflow3Warning: Calling `DAG.create_dagrun()` without an explicit data interval is deprecated
created dagrun <DagRun example_bash_operator @ 2022-01-03T00:00:00+00:00: manual__2022-01-03T00:00:00+00:00, state:running, queued_at: None. externally triggered: False>
[2022-09-14 14:43:46,013] {dag_command.py:496} INFO - starting dagrun
[2022-09-14 14:43:46,138] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:46,138] {dag_command.py:543} INFO - Running task runme_0
[2022-09-14 14:43:46,144] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:43:46,145] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "example_bash_operator__runme_0__20220103" && sleep 1']
[2022-09-14 14:43:46,152] {subprocess.py:85} INFO - Output:
[2022-09-14 14:43:46,154] {subprocess.py:92} INFO - example_bash_operator__runme_0__20220103
[2022-09-14 14:43:47,158] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:43:47,179] {dag_command.py:547} INFO - runme_0 ran successfully!
[2022-09-14 14:43:47,180] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:47,197] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:47,197] {dag_command.py:543} INFO - Running task runme_1
[2022-09-14 14:43:47,203] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:43:47,204] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "example_bash_operator__runme_1__20220103" && sleep 1']
[2022-09-14 14:43:47,211] {subprocess.py:85} INFO - Output:
[2022-09-14 14:43:47,213] {subprocess.py:92} INFO - example_bash_operator__runme_1__20220103
[2022-09-14 14:43:48,219] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:43:48,239] {dag_command.py:547} INFO - runme_1 ran successfully!
[2022-09-14 14:43:48,239] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:48,253] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:48,253] {dag_command.py:543} INFO - Running task runme_2
[2022-09-14 14:43:48,259] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:43:48,259] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "example_bash_operator__runme_2__20220103" && sleep 1']
[2022-09-14 14:43:48,266] {subprocess.py:85} INFO - Output:
[2022-09-14 14:43:48,267] {subprocess.py:92} INFO - example_bash_operator__runme_2__20220103
[2022-09-14 14:43:49,273] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:43:49,294] {dag_command.py:547} INFO - runme_2 ran successfully!
[2022-09-14 14:43:49,295] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:49,374] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:49,374] {dag_command.py:543} INFO - Running task also_run_this
[2022-09-14 14:43:49,380] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:43:49,380] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "ti_key=example_bash_operator__also_run_this__20220103"']
[2022-09-14 14:43:49,389] {subprocess.py:85} INFO - Output:
[2022-09-14 14:43:49,390] {subprocess.py:92} INFO - ti_key=example_bash_operator__also_run_this__20220103
[2022-09-14 14:43:49,390] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:43:49,395] {dag_command.py:547} INFO - also_run_this ran successfully!
[2022-09-14 14:43:49,395] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:49,406] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:49,407] {dag_command.py:543} INFO - Running task this_will_skip
[2022-09-14 14:43:49,412] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:43:49,413] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo "hello world"; exit 99;']
[2022-09-14 14:43:49,420] {subprocess.py:85} INFO - Output:
[2022-09-14 14:43:49,421] {subprocess.py:92} INFO - hello world
[2022-09-14 14:43:49,422] {subprocess.py:96} INFO - Command exited with return code 99
[2022-09-14 14:43:49,422] {dag_command.py:549} INFO - Task Skipped, continuing
[2022-09-14 14:43:49,422] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:49,440] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:49,440] {dag_command.py:543} INFO - Running task run_after_loop
[2022-09-14 14:43:49,445] {subprocess.py:62} INFO - Tmp dir root location:
 /tmp
[2022-09-14 14:43:49,446] {subprocess.py:74} INFO - Running command: ['/bin/bash', '-c', 'echo 1']
[2022-09-14 14:43:49,453] {subprocess.py:85} INFO - Output:
[2022-09-14 14:43:49,454] {subprocess.py:92} INFO - 1
[2022-09-14 14:43:49,454] {subprocess.py:96} INFO - Command exited with return code 0
[2022-09-14 14:43:49,458] {dag_command.py:547} INFO - run_after_loop ran successfully!
[2022-09-14 14:43:49,458] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:49,471] {dag_command.py:542} INFO - *****************************************************
[2022-09-14 14:43:49,471] {dag_command.py:543} INFO - Running task run_this_last
[2022-09-14 14:43:49,475] {dag_command.py:547} INFO - run_this_last ran successfully!
[2022-09-14 14:43:49,475] {dag_command.py:550} INFO - *****************************************************
[2022-09-14 14:43:49,480] {dagrun.py:595} INFO - Marking run <DagRun example_bash_operator @ 2022-01-03T00:00:00+00:00: manual__2022-01-03T00:00:00+00:00, state:running, queued_at: None. externally triggered: False> successful
[2022-09-14 14:43:49,480] {dagrun.py:657} INFO - DagRun Finished: dag_id=example_bash_operator, execution_date=2022-01-03T00:00:00+00:00, run_id=manual__2022-01-03T00:00:00+00:00, run_start_date=2022-01-03T00:00:00+00:00, run_end_date=2022-09-14 14:43:49.480895+00:00, run_duration=21998629.480895, state=success, external_trigger=False, run_type=manual, data_interval_start=2022-01-03T00:00:00+00:00, data_interval_end=2022-01-04T00:00:00+00:00, dag_hash=None

^ Add meaningful description above

Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

dimberman avatar Sep 14 '22 17:09 dimberman

This looks sooooper cool :). I will do some quick testing to see the experience.

One suggestions (asking more what you think about it and maybe those could be separate PRs., or maybe we could incorporate it in this one).

How about adding integration with pudb https://github.com/inducer/pudb - this is a very nice, console debugger with capabilities equal to those that modern IDE debuggers have (colored output/code syntax/breakpoints/watches/capability of opening iPython console at any moment and use repl to try/test anything. I'v heard very good things about pudb and I thought that we should integrate it better into Breeze experience, but I think, since our DAGs are Python, it would be perfect to add it here.

I imagine just adding --debug flag and automatically stopping at the first task of a DAG or smth.

WDYT @dimberman ?

potiuk avatar Sep 14 '22 17:09 potiuk

Thank you @potiuk !

Do you think that would be too oppinionated to choose a specific debugger? One really awesome thing we could potentially do with this is even write a guide to use this in pycharm and vscode (since there's no parallelism it shouldn't be hard)

dimberman avatar Sep 14 '22 18:09 dimberman

I don't think we should integrate any specific debugger here in this CLI function, but instruct people to use breakpoint() and then either it "just works" with their IDE's integrated debugger, or you can set PYTHONBREAKPOINT="pudb.set_trace" (for example) to change what debugger is used.

ashb avatar Sep 14 '22 19:09 ashb

Thank you @potiuk !

Do you think that would be too oppinionated to choose a specific debugger? One really awesome thing we could potentially do with this is even write a guide to use this in pycharm and vscode (since there's no parallelism it shouldn't be hard)

I think not - especially that it is console textui/console only . I honestly don't care what the debugger is, I want this to run effortlessly. If we can make this works out-of-the-box without any manual setup, I would be all for it.

The problem with vscode/Pycharm is that it is not at all straightforward to integrate with "airflow dag test" command. Have you tried to add debug mode for any "airflow" command in either of them, this is a lot of hassle. First of all - it will not work when you have remote /shell access easily. Secondly, you cannot easily turn "airflow" command iine into a debuggable run. You need to create a custom configuraiton first, then have a local venv configured with the right version of airlfow installed and configured in the IDE, third you need to do "run -> create debug configuration" then you have to choose "module" then choose airflow module, then type extra parameters to pass. That's a lot of keystrokes and knowledge what to do. AND you have to heave your IDE installed in the place where you have your Airlfow installed. This might be acceptable for Breeze/Airflow development, but not at all acceptable for DAG development.

Using "opinionated debugger" with console interface is so much less friction.

just 'airflow dag test test" and add --debug . That's it. You can still do the above with any IDE you have. This will not block you - but this one will enable running visual debuggging even if you just logged in remotely with SSH to Airlfow installation.

Let me reverse the question:

What is the problem you foresee by having an opinionated predefined flag when it does not close any of the other options?

potiuk avatar Sep 14 '22 19:09 potiuk

We can even do not make it a requiremeent, but install it dynamically when someone uses --debug option BTW.

potiuk avatar Sep 14 '22 19:09 potiuk

Or name it --debug-pudb as a flag.

potiuk avatar Sep 14 '22 19:09 potiuk

It needs a bit of work/polish, but using this

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Airflow: Dag in current file",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "args": ["dags", "test"],
            "console": "integratedTerminal",
            "justMyCode": true
        }
    ]
}

That gives me something I can run against a dag file. To make that work I did add this to the dag file (and a couple of other local changes to make this work for dags test command:

if __name__ == "__main__":
    dag.cli()

Edit: THis works from VScode running on windows against an airflow in WSL2.

ashb avatar Sep 14 '22 19:09 ashb

Sneak peak

2022-09-14 20-34-13.webm

ashb avatar Sep 14 '22 19:09 ashb

Sneak peak

2022-09-14.20-34-13.webm

We already have DebugExecutor that does exactly that in the same way and a bit simpler (and is portable across IDEs) I think.. https://airflow.apache.org/docs/apache-airflow/stable/executor/debug.html?highlight=debug+executor

I was telling about debugging without having any IDE setup.

potiuk avatar Sep 14 '22 19:09 potiuk

We already have DebugExecutor that does exactly that in the same way and a bit simpler (and is portable across IDEs) I think.. https://airflow.apache.org/docs/apache-airflow/stable/executor/debug.html?highlight=debug+executor

I was telling about debugging without having any IDE setup.

Oh yes. Still, see my previous point about breakpoint() then

ashb avatar Sep 14 '22 19:09 ashb

Oh yes. Still, see my previous point about breakpoint() then

breakpoint is fine. but it will not give you a fully-fledged, intuitive debugger in the console wiht visual debugging capabilties, syntax coloring, variables, watches etc. when you are running airflow dag test on a system where you logged in remotely to and you have no IDE, nor environment setup in your laptops.

I think (unlike Breeze/development) THIS is our audience. Python developers who have airflow setup by their company on some remote system, they can login to and would like to debug their DAGs. No IDE, no virtualenv locally, Windows workstation maybe, only terminal to connect to a development server with Airflow installed.

I am not telling we NEED to do it, I am just telling that those guys are very unlikely to spend their time on setting up their local workstation to have fully-fledged IDE. They know python and will be able to debug stuff if they see a visual tool to help them. They are not professional Python programmers who know how to setup their IDE. They just develop their DAG in Notepad and drop it to the shared folder and would like to debug it on the dev airflow instance.

This is my "persona" when I think about it. And "airflow dag test" is a fantastic entrypoint for those people, and adding a simple --debug flag to have their visual debugging ready is absolutely amazing if you forget the years of engineering experience you have (and they don't).

potiuk avatar Sep 14 '22 19:09 potiuk

@potiuk could we make the debugger a separate discussion? I don't personally have a strong opinion one way or another, but it sounds like that might be something that could slow down the PR if there is disagreement over it.

I think it will also make sense for us to write up development guides for IDEs/CLI dev and maybe installing this tool could be a part of that?

dimberman avatar Sep 14 '22 20:09 dimberman

@potiuk could we make the debugger a separate discussion? I don't personally have a strong opinion one way or another, but it sounds like that might be something that could slow down the PR if there is disagreement over it.

Yes. I just wanted to know what you feel about it - I mentioned it could be a separate PR and the more I think about it. the more I like it.

I think it will also make sense for us to write up development guides for IDEs/CLI dev and maybe installing this tool could be a part of that?

The whole Idee heere is NOT to write development guide for it. The "--debug" switch "dag test" command shoudl be just explained as "enables visual debugging, here is the link to the docs". This is alll documentation that should be written for it.

potiuk avatar Sep 14 '22 21:09 potiuk

@dimberman got this deprecation warning which ... probably you can avoid somehow:

local ❯ airflow dags test example_time_delta_sensor_async -S /Users/dstandish/code/airflow/airflow/example_dags/example_time_delta_sensor_async.py
/Users/dstandish/code/airflow/airflow/models/dag.py:3601 RemovedInAirflow3Warning: Calling `DAG.create_dagrun()` without an explicit data interval is deprecated

dstandish avatar Sep 14 '22 21:09 dstandish

Prior art for debugger: https://docs.pytest.org/en/7.1.x/how-to/failures.html?highlight=pdbcls

--pdb to drop in to a debugger shell on error, --trace to drop in to a shell at the start of a test (not sure what the exact equivalent for us would be) and --pdbcls to change which debugger is used.

ashb avatar Sep 15 '22 08:09 ashb

Prior art for debugger: https://docs.pytest.org/en/7.1.x/how-to/failures.html?highlight=pdbcls

--pdb to drop in to a debugger shell on error, --trace to drop in to a shell at the start of a test (not sure what the exact equivalent for us would be) and --pdbcls to change which debugger is used.

And the same source (just few lines earlier):

pytest allows one to drop into the [pdb](https://docs.python.org/3/library/pdb.html) prompt immediately at the start of each test via a command line option:

pytest --trace
This will invoke the Python debugger at the start of every test.

Which does not use breakpoint(), justs invokes debugger at the begginning of the test. I think adding 'breakpoint()' or in general modifying the code under tests in order to start debugging it is a huge barrier for people to use debugging intutitively. Simply using a command line switch is better, and using visual debugger out of the box is even better (literally no-one except a few gdb veterans would even get close to pdb).

potiuk avatar Sep 15 '22 13:09 potiuk

Ah I see you mentioned it too - yeah I think this is a good way of approaching it to mimic pytest behaviour :). Just with default being different than pdb because I find it mostly intimidating rather than helpful. And I am happy to draft PR on that once this one is merged :).

potiuk avatar Sep 15 '22 13:09 potiuk

BTW. The fix for provider builds testing on constraints is already merged and those errors are irrelevant

potiuk avatar Sep 15 '22 15:09 potiuk

Nobody seems to have mentioned this yet, but there’s #18758. I have not read the implemention in this PR, but the description complained about backfill so I’m hopeful it addresses the problems!

uranusjr avatar Sep 15 '22 17:09 uranusjr

@uranusjr this addresses your issue by making airflow dags test actually useful as a DAG testing tool. I have no opinion on making an airflow dags backfill command.

dimberman avatar Sep 15 '22 18:09 dimberman

making airflow dags test actually useful as a DAG testing tool.

The thing I’m worrying about is people might actually be relying on this command’s current implementation (doing backfill and implications). I have no evidence either way, but the safest approach would be to introduce a tool at the same release we “fix” airflow dags test so if people come complaining we at least have a good alternative ready for them, minimising the effect of potential backward incompatibility.

It’s probably also a good idea to categorise this as a “significant” change and add a release note.

uranusjr avatar Sep 15 '22 19:09 uranusjr

@uranusjr in that case should we make a separate ticket to create an airflow dags backfill command? I wouldn't want to leave this function broken for some small edge-case of people who are using it for something it shouldn't even be used for.

dimberman avatar Sep 15 '22 20:09 dimberman

@uranusjr in that case should we make a separate ticket to create an airflow dags backfill command? I wouldn't want to leave this function broken for some small edge-case of people who are using it for something it shouldn't even be used for.

there already is a dags backfill command :)

related...

I finally read @uranusjr's issue, https://github.com/apache/airflow/issues/18758. I agree with that approach. What he's saying is ...... new option for airflow dags trigger which makes it run the tasks in-process, in sequence. Then the backfill part... i'm not sure exactly what he was saying... but i guess it is about running for a specific (optional) execution date (rather than a range, which i think is required now?

anyway, basically, the test command has nothing to do with "testing". it's just a manual run. so really, it's probably best not to use the word test in the method / commands. even if it's common, and recommended to use it for debugging / developing / "testing" your dags, the name does not need to assume that for the user. it's just an in-process, manual, local, ad hoc run. and we should probably make it configurable whether it saves the metadata or purges it on completion (or, doesn't store it in the first place)

so what am i actually suggesting... well... maybe we name methods in a more usage-agnostic way... such as... run_local or something? and maybe instead of replacing airflow dags test, we put this under an option under dags trigger. and we deprecate airflow dags test. and then we allow backfill to fill the void left by dags test.

@uranusjr also suggested that this live under its own job class... like ManualJobRun... seems reasonable... though that could also come as a refactor I reckon.

dstandish avatar Sep 15 '22 21:09 dstandish

I have plans/ideas in my head on how to turn this in to a true test only command -- one that leaves no trace of the run in the Database at the end.

ashb avatar Sep 16 '22 11:09 ashb

I have plans/ideas in my head on how to turn this in to a true test only command -- one that leaves no trace of the run in the Database at the end.

All for it. And that's also why I think having this integrated, nice console and visual debugger into the experience would be awesome. It's the very same thing what we did for Breeze for K9S integration into the "helm testing" experience. The fact that you can just launch it and it works out-of-the-box without you having to install it and that it starts already pre-integrated with the right environment variables set and namespace set to airflow makes the whole world of difference - especially for users who are no "pro" Python programmrs (or K8S experienced people in case of k9s). The visual debugging/k9s have the whole "intutive" learning part that puts so many people from learning all the complexity involved in "debugging" or "managing k8s". It's like to tell people to use vim or pre-configure vscode with codespaces for them when they can "jump-in" directly into a great tool and start using it without spending hours on learning how it works.

potiuk avatar Sep 16 '22 11:09 potiuk

I have plans/ideas in my head on how to turn this in to a true test only command -- one that leaves no trace of the run in the Database at the end.

BTW. Not storing tasks in the DB was what DebugExecutor had been doing already since the beginning and once we merge this one, we can iteratively make sure we replace all the DebugExecutor's doing with the test command.:

  1. main() and debuggability by just running DAG as a plain Python script

  2. not sure if we have tasks running as sub-processes (that prevented step-by-step debugging between tasks) - but I guess this should be possible to run all task in dag test in a single process to allow that. Debugging multiprocessing ain't straightforward

  3. Integration of running the visual debugging in terminal easily (the pyudb) I mentioned seems to be good candidate

  4. if we run in-process task running, there needs to be a way to automatically set sensors in poke-reschedule mode because otherwise running a sensor might lock the process if we have two related tasks: sense_for_resource + create the resourc that the sensor waits for (this is quite often pattern when the remote resource needs time to intialize and the cration is not fully synchronous (this is what DebugExecutor did).

potiuk avatar Sep 19 '22 08:09 potiuk

  1. not sure if we have tasks running as sub-processes (that prevented step-by-step debugging between tasks) - but I guess this should be possible to run all task in dag test in a single process to allow that. Debugging multiprocessing ain't straightforward

This PR has all tasks running in one process so you can set breakpoints and step between them etc.

ashb avatar Sep 19 '22 12:09 ashb

  1. not sure if we have tasks running as sub-processes (that prevented step-by-step debugging between tasks) - but I guess this should be possible to run all task in dag test in a single process to allow that. Debugging multiprocessing ain't straightforward

This PR has all tasks running in one process so you can set breakpoints and step between them etc.

Cool!

potiuk avatar Sep 19 '22 12:09 potiuk