airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Dataproc submit job operator async

Open bjankie1 opened this issue 2 years ago • 10 comments

Add deferrable capability to existing DataprocJobBaseOperator and DataprocSubmitJobOperator operators.


^ Add meaningful description above

Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

bjankie1 avatar Jul 26 '22 10:07 bjankie1

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack

boring-cyborg[bot] avatar Jul 26 '22 10:07 boring-cyborg[bot]

Looks pretty coll - but we also need some examples and likely entries in the documentation describing the usage and mentioning deferrable options. Otherwise it will not be discoverable enough.

Thank you for pointing it out. I've fixed with #1bf260a

bjankie1 avatar Jul 29 '22 08:07 bjankie1

Some tewsts failing

potiuk avatar Jul 29 '22 12:07 potiuk

Test cases for triggerer are missing.

rajaths010494 avatar Aug 01 '22 11:08 rajaths010494

Some errors to fix.

potiuk avatar Aug 02 '22 19:08 potiuk

@potiuk @bjankie1 Previous we have created two different operators/sensors one for synchronous and one asynchronous for example DatabricksSubmitRunDeferrableOperator is for async and DatabricksSubmitRunOperator for sync but in this PR we have added a flag for async in the existing operator. does it would be considered as an inconsistency and do we need to worry about it.? https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py#L368

pankajastro avatar Aug 02 '22 19:08 pankajastro

I am perfectly ok with per-provider consistency, rather than "per-airflow" consistency. There is no reason why we should "force" one way or the other cross-providers. And there might be reaons why it's easier for one provider to do it this way and for another provider - different way.

Traditionally each provider should follow their own "standards" and be consistent - for all things except the Airflow "performace" and best practices. We are gearing up (in our tooling for now but soon in the code) to splitting providers to individual repos and then it will be even less important is such level of consistency is required.

I personally think of this in the very way Apache Software Foundation way does with their projects. There is a very, very small but super-strict interface of the "distributed" component (project in ASF and provider in Airlfow) should follow and it should be strict and followed - but all the rest should be left to decide internally (by project in ASF and by provider in Airlfow).

The things that we care about at the airlfow level should very well described in https://github.com/apache/airflow/blob/main/README.md and strictly regulated by the processes/automation. All the rest - people who are most active in the providers should decide.

potiuk avatar Aug 02 '22 20:08 potiuk

And static/docs need to be fixed too

potiuk avatar Aug 03 '22 11:08 potiuk

Test failing :(

potiuk avatar Aug 05 '22 17:08 potiuk

One doc failure left.

potiuk avatar Aug 08 '22 07:08 potiuk

Awesome work, congrats on your first merged pull request!

boring-cyborg[bot] avatar Aug 22 '22 19:08 boring-cyborg[bot]