airflow icon indicating copy to clipboard operation
airflow copied to clipboard

New generaic SQL transfer operator proposal

Open frodo2000 opened this issue 1 year ago • 10 comments


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

frodo2000 avatar Mar 19 '24 09:03 frodo2000

That looks good in general, but before we geto more detailed review we need some unit tests on it (as for any other new code).

potiuk avatar Mar 19 '24 10:03 potiuk

Also - we should make the Generic Transfer from airflow.operatror deprecated as this is a superset of the Generic Transfer (and common.sql is way better place for such a generic transfer BTW).

potiuk avatar Mar 19 '24 10:03 potiuk

@Taragolis @potiuk @eladkal I had changed code for that provider. Transfer logic is moved to:

  • DBApiHook - generic transfer
  • OracleHook - Oracle bulk insert transfer
  • SnowflakeHook - Snowflake write_pandas mode (or if pandas is not installed using generic mode)

Additionally I found that old generic_transfer has preoperator. Name was confiusing that is why new operator has two parameters:

  • destination_before_sql - logic executed before transfer
  • destination_after_sql - logic executed after transfer

Additionally I write quick test

@eladkal stack is connected to DBApiHook support

frodo2000 avatar Mar 19 '24 20:03 frodo2000

Explanation of the mypy problem:

The MyPy Issue is because for DBApiHook we have a .pyi interface as well defined (to flag any case where we change the interface. See airflow/providers/common/sql/hooks/sql.pyi. And this is actually expected behaviour here for mypy to fail, because it drags our attention to this change in the DBApiHook "public API".

So it's great it failed here, otherwise we would not have noticed a dependency issue :). So the failure of MyPy here is an intended harness, not a bug.

We should in this case:

  • add the data_transfer method to the sql.pyi (i.e. change interface of DBApiHook)

  • increase MINOR version of the common.sql -> this is a new feature the snowflake and oracle hooks depend on. Currently it is 1.11.1 so we should bump it to 1.12.0.

  • add common.sql >= _new_common_sql_version in dependencies of the modified providers - they have currently:

    • Snowflake: apache-airflow-providers-common-sql>=1.10.0
    • Oracle: apache-airflow-providers-common-sql>=1.3.1 Both should get >= 1.12.0 there.
  • We should also likely add in the docstring of DBAPIHook something along the lines "If you are extending this method, make sure to use apache-airflow-providers-common-sql>=1.12.0 dependency in your provider.

potiuk avatar Mar 20 '24 05:03 potiuk

@Taragolis @potiuk @eladkal Operator passed the test. Additionally I had check the following hooks and DB:

  • IBM DB2 (ODBC Hook)
  • Oracle (Oracle Hook)
  • MySQL (mySQL Hook)
  • Postgres (postgres Hook)
  • Snowflake (Snowflake Hook)

frodo2000 avatar Mar 20 '24 12:03 frodo2000

@Taragolis @potiuk @eladkal Operator passed the test. Additionally I had check the following hooks and DB:

  • IBM DB2 (ODBC Hook)
  • Oracle (Oracle Hook)
  • MySQL (mySQL Hook)
  • Postgres (postgres Hook)
  • Snowflake (Snowflake Hook)

Please check https://github.com/apache/airflow/pull/38281#discussion_r1530695191

eladkal avatar Mar 20 '24 15:03 eladkal

How about extracting the common code for version check? Sounds reasonable to do it now?

potiuk avatar Mar 22 '24 17:03 potiuk

How about extracting the common code for version check? Sounds reasonable to do it now? Probably yes. How to perform that?

frodo2000 avatar Mar 22 '24 19:03 frodo2000

Probably yes. How to perform that?

Just the usual "Refactor -> Extract" - both are in common.sql provider, so adding common.sql.util package with provider_compatibility.py and moving the common code there seems pretty natural

potiuk avatar Mar 22 '24 19:03 potiuk

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 07 '24 00:05 github-actions[bot]