airflow icon indicating copy to clipboard operation
airflow copied to clipboard

post_execute() method does not work correctly when used along with a trigger

Open venkatajagannath opened this issue 1 year ago • 7 comments
trafficstars

Apache Airflow version

2.10.0

If "Other Airflow 2 version" selected, which one?

No response

What happened?

I have a new provider to send jobs to a Ray cluster. One of the operators has pre_execute and post_execute operations. Here is the complete flow of control

  1. pre_execute() --> Sets up a Ray cluster
  2. execute() --> Triggers a new job on the cluster and defers tracking to a trigger
  3. async trigger --> tracks the job execution and prints logs
  4. execute_complete() --> completes main task execution after the trigger reaches terminal state
  5. post_execute() --> Deletes the Ray cluster

When post_execute() starting executing it seems to be running pre_execute() code again. This is a bug. We should only see post_execute code. Logs attached.

post_execute_bug.txt

What you think should happen instead?

Only code relevant to the post_execute method must execute.

How to reproduce

  1. Install the provider using the instructions mentioned on this github page
  2. Run one of the example DAGs that only uses the SubmitRayJob operator in the DAG

You will need access to a k8 cluster and will need to provide connection details as shown in step 1

Operating System

linux

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

venkatajagannath avatar Aug 23 '24 22:08 venkatajagannath

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Aug 23 '24 22:08 boring-cyborg[bot]

(Copying my message from elsewhere with minor edits for context)

It’s better to use setup and teardown because they handle errors better.

For the reported issue specifically, pre_execute is actually executed every time a worker is started for the task, not only when execute is called, despite the name. (This is also true for on_execution_callback, by the way.) Resuming a deferred task starts a worker and causes pre_execute to be called.

This is kind of by design, but I guess there’s a case to be made it’s not a good design. If we are to change this, I would recommend:

  1. Rename pre_execute and on_execution_callback to remove execute from the name to avoid misunderstandings. Document them well to describe the actual behaviour.
  2. (Maybe?) Add a hook that’s actually only run when the first worker is started for a ti.
  3. (Maybe??) Add hooks that are run when a task is deferred and resumed?

uranusjr avatar Aug 24 '24 08:08 uranusjr

@uranusjr Does it make sense to rename this issue to "Rename pre_execute and on_execution_callback to remove execute from the name to avoid misunderstandings"?

RNHTTR avatar Aug 24 '24 13:08 RNHTTR

@uranusjr

pre_execute() and post_execute() naming convention is very intuitive. Would it be possible to change the backend design instead?

I've used setup and teardown. Example here. But, my current usecase is slightly different.

Its a much better UX to use just one decorator to spin up/down a cluster and also execute the main job. See example here. We also hear the same feedback from customers.

One idea:

(Maybe?) we can extend setup/teardown to configure a method within a custom operator as a setup and another as a teardown. These methods will execute similar to setup/teardown in a DAG.

venkatajagannath avatar Aug 24 '24 17:08 venkatajagannath

@uranusjr Does it make sense to rename this issue to "Rename pre_execute and on_execution_callback to remove execute from the name to avoid misunderstandings"?

I'm not sure whether we already decide to change it this way. Personally, I prefer 2 and 3. 1 might not help much for one who doesn't understand airflow in depth. Providing 2 and 3 will give the user a sense that these things are different

Lee-W avatar Aug 25 '24 03:08 Lee-W

I guess the question is, is a hook that runs every time a worker starts useful for people? We need something for that if we change the semantic of pre_execute (and on_execution_hook).

uranusjr avatar Aug 26 '24 10:08 uranusjr

I guess the question is, is a hook that runs every time a worker starts useful for people? We need something for that if we change the semantic of pre_execute (and on_execution_hook).

For operators works like S3KeySensor, I think it's possible 🤔

Lee-W avatar Aug 26 '24 11:08 Lee-W