New generaic SQL transfer operator proposal
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
That looks good in general, but before we geto more detailed review we need some unit tests on it (as for any other new code).
Also - we should make the Generic Transfer from airflow.operatror deprecated as this is a superset of the Generic Transfer (and common.sql is way better place for such a generic transfer BTW).
@Taragolis @potiuk @eladkal I had changed code for that provider. Transfer logic is moved to:
- DBApiHook - generic transfer
- OracleHook - Oracle bulk insert transfer
- SnowflakeHook - Snowflake write_pandas mode (or if pandas is not installed using generic mode)
Additionally I found that old generic_transfer has preoperator. Name was confiusing that is why new operator has two parameters:
- destination_before_sql - logic executed before transfer
- destination_after_sql - logic executed after transfer
Additionally I write quick test
@eladkal stack is connected to DBApiHook support
Explanation of the mypy problem:
The MyPy Issue is because for DBApiHook we have a .pyi interface as well defined (to flag any case where we change the interface. See airflow/providers/common/sql/hooks/sql.pyi. And this is actually expected behaviour here for mypy to fail, because it drags our attention to this change in the DBApiHook "public API".
So it's great it failed here, otherwise we would not have noticed a dependency issue :). So the failure of MyPy here is an intended harness, not a bug.
We should in this case:
-
add the
data_transfermethod to the sql.pyi (i.e. change interface of DBApiHook) -
increase MINOR version of the
common.sql-> this is a new feature the snowflake and oracle hooks depend on. Currently it is1.11.1so we should bump it to1.12.0. -
add common.sql >=
_new_common_sql_versionin dependencies of the modified providers - they have currently:- Snowflake:
apache-airflow-providers-common-sql>=1.10.0 - Oracle:
apache-airflow-providers-common-sql>=1.3.1Both should get>= 1.12.0there.
- Snowflake:
-
We should also likely add in the docstring of DBAPIHook something along the lines "If you are extending this method, make sure to use
apache-airflow-providers-common-sql>=1.12.0dependency in your provider.
@Taragolis @potiuk @eladkal Operator passed the test. Additionally I had check the following hooks and DB:
- IBM DB2 (ODBC Hook)
- Oracle (Oracle Hook)
- MySQL (mySQL Hook)
- Postgres (postgres Hook)
- Snowflake (Snowflake Hook)
@Taragolis @potiuk @eladkal Operator passed the test. Additionally I had check the following hooks and DB:
- IBM DB2 (ODBC Hook)
- Oracle (Oracle Hook)
- MySQL (mySQL Hook)
- Postgres (postgres Hook)
- Snowflake (Snowflake Hook)
Please check https://github.com/apache/airflow/pull/38281#discussion_r1530695191
How about extracting the common code for version check? Sounds reasonable to do it now?
How about extracting the common code for version check? Sounds reasonable to do it now? Probably yes. How to perform that?
Probably yes. How to perform that?
Just the usual "Refactor -> Extract" - both are in common.sql provider, so adding common.sql.util package with provider_compatibility.py and moving the common code there seems pretty natural
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.