airflow
airflow copied to clipboard
Dataproc submit job operator async
Add deferrable
capability to existing DataprocJobBaseOperator
and DataprocSubmitJobOperator
operators.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst
or {issue_number}.significant.rst
, in newsfragments.
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points:
- Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
- In case of a new feature add useful documentation (in docstrings or in
docs/
directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it. - Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
- Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
- Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
- Be sure to read the Airflow Coding style. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack
Looks pretty coll - but we also need some examples and likely entries in the documentation describing the usage and mentioning deferrable options. Otherwise it will not be discoverable enough.
Thank you for pointing it out. I've fixed with #1bf260a
Some tewsts failing
Test cases for triggerer are missing.
Some errors to fix.
@potiuk @bjankie1 Previous we have created two different operators/sensors one for synchronous and one asynchronous for example DatabricksSubmitRunDeferrableOperator is for async and DatabricksSubmitRunOperator for sync but in this PR we have added a flag for async in the existing operator. does it would be considered as an inconsistency and do we need to worry about it.? https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py#L368
I am perfectly ok with per-provider consistency, rather than "per-airflow" consistency. There is no reason why we should "force" one way or the other cross-providers. And there might be reaons why it's easier for one provider to do it this way and for another provider - different way.
Traditionally each provider should follow their own "standards" and be consistent - for all things except the Airflow "performace" and best practices. We are gearing up (in our tooling for now but soon in the code) to splitting providers to individual repos and then it will be even less important is such level of consistency is required.
I personally think of this in the very way Apache Software Foundation way does with their projects. There is a very, very small but super-strict interface of the "distributed" component (project in ASF and provider in Airlfow) should follow and it should be strict and followed - but all the rest should be left to decide internally (by project in ASF and by provider in Airlfow).
The things that we care about at the airlfow level should very well described in https://github.com/apache/airflow/blob/main/README.md and strictly regulated by the processes/automation. All the rest - people who are most active in the providers should decide.
And static/docs need to be fixed too
Test failing :(
One doc failure left.
Awesome work, congrats on your first merged pull request!