AIP-69: Airflow Core adjustments for introduction of Remote Executor
This PR is a companion to PR #41729 and splits the core adjustments from PoC in PR #40224 and "Mothership" PR #40900.
The PR carries documentation and a small extension to the variables allowing the core to "know" RemoteExecutor. Therefore it is not really a feature but mainly a doc-only change and would propose to add this also to Airflow 2.10 line. Feedback welcome.
Main change - as we had a small discussion about the name clash of "Remote Executor" is to adjust the documentation about executors to call all "Non Local Executors" to be "Distributed Executors" whereas the new RemoteExecutor is one sort of Distributed == Non Local Execution.
Note: Build of docs depends on PR for Provider Package being merged first. (Reference missing)
Note FYI especially as we had the naming discussion in the vote thread: @o-nikolas @shahar1
I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think
distributedis a good name and works perfectly well for the new feature being delivered.
Agrew with @vincbeck. Naming is hard but I'd rather call the new executor "Distributed" rather than the reuse "old" name for it. Both namings are equally good IMHO and it's up to us to define them - so in this case it is the question of either adding new term, or reusing old term for something else and adding the new term for smth else. Sounds it has a lot potential for confusion.;
I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think
distributedis a good name and works perfectly well for the new feature being delivered.Agrew with @vincbeck. Naming is hard but I'd rather call the new executor "Distributed" rather than the reuse "old" name for it. Both namings are equally good IMHO and it's up to us to define them - so in this case it is the question of either adding new term, or reusing old term for something else and adding the new term for smth else. Sounds it has a lot potential for confusion.;
I'm rather for the term remote as it is shorter and makes CLI and docs shorter. But not fixed to the name. I can make a bulk search&replace. Shall I call a vot eon the name on the DevList?
At least discuss - yes. would be good. We did that in recent past.
I'm rather for the term
remoteas it is shorter and makes CLI and docs shorter. But not fixed to the name. I can make a bulk search&replace. Shall I call a vot eon the name on the DevList?
--> Happy discussion! https://lists.apache.org/thread/br1jfoc8p1wjzk74c09srjgr29spytfy
Needs to be cherry-picked to v2-10-test I guess
(and tests are failing too)
I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think
distributedis a good name and works perfectly well for the new feature being delivered.
+1 on this; I think the distributed would be better
Hey @jscheffl -> I wanted to make an attempt to remove AIP-44 completely from main/ Airflow 3 - which means likely that we will have to do something about those changes for Edge Executor in main.
Do you think the provider can be impletemented and built in main even if we remove all AIP-44-related things from main?
Hey @jscheffl -> I wanted to make an attempt to remove AIP-44 completely from
main/ Airflow 3 - which means likely that we will have to do something about those changes for Edge Executor in main.Do you think the provider can be impletemented and built in
maineven if we remove all AIP-44-related things from main?
Hi @potiuk AIP-44 is the base for the Executor. If you want to wipe the codebase and slim it down (freshest legacy) then we would need to filter the provider from testing on main - and just keep testing the provider against 2.10 compat. There might be multiple options like (1) filtering it in the list in breeze, (2) Implement a pytest.skip based on the airflow version string.
I would favor (of course) if we keep AIP-44 at least for a moment until AIP-72 matures a bit to make it usable. I would assume once AIP-72 is available as a first MVP we can change to code in a way that provider loads modules from v2 compat for Airflow 2 and "new AIP-72" compatible bindings if Airflow 3 (Similar like we did it in the past with Pydantic 1/2 switch).
How much pressure do you feel to remove AIP-44?
I assume your question is also for a full cleaning of all AIP-44 traces? Some "pieces" could be removed early like the internal API apiserver - but of course the @internal_api decorator logic must be sitting in the core and on the exposed API functions.
How much pressure do you feel to remove AIP-44?
I personaly don't feel much pressure - other than all the "peer" presure where other deprecations being actively removed as we speak.
Also I am going for almost 3 weeks holidays after the Summit, so it's less for me and more for all the work that is going to happen around AIP-72 especially while I am away.
There is quite some overlap - a lot of code for AIP-72 is going to involve refactorings and rewrites of pretty much the same parts as AIP-44 touched - it will be on a higher level than AIP-44 did. While AIP-44 went very "low level" - and replaced and extracted individual DB transactions from the code, AIP-72 is going to implement a way higher "abstraction" as regular API calls, where the operations - task and daf file parsing, and callbacks - will be wrapped in higher-level API calls.
But I imagine a lot of the code/ implementation will be about pretty much the same code (and your concerns about Edge worker switching to using the APIs of AIP-72 are only confirming that)
I thought that removing AIP-44 code first (and keeping tests running) would remove quite a lot of noise from AIP-72 work, so I think it might be worth doing before.
But ultimately, it's up to @ashb POC (which we have not seen yet) and details how the implementation of AIP-72 will be done (Replace? Copy? Running old DB functionality and new code in parallel? Duplicating the code first in separate projects before proceeding with AIP-72 implementation).
So I think the question "Should we remove AIP-44 first or is it ok to leave it for now" is more a question on how AIP-72 process will look like - and as such only @ashb can answer it properly I think.
Certainly leaving AIP-44 in main for now makes total sense.
Once I'm back from the Summit I plan on turning my isolated POC code into draft PRs etc for proper review, and then we can decide, but I do think AIP-44 etc should be removed before Airflow 3 hits beta.
Certainly leaving AIP-44 in main for now makes total sense.
Once I'm back from the Summit I plan on turning my isolated POC code into draft PRs etc for proper review, and then we can decide, but I do think AIP-44 etc should be removed before Airflow 3 hits beta.
@potiuk @ashb Thanks for the feedback. So if there is no immediate pressure and also some time that would be great. Looking forward for the first AIP-72 impressions and also once migrating over I'd offer to contribute to AIP-44 removal/cleaning. Of course BEFORE any Airflow 3 Beta this should be made.