Extra parameters in MSSQL Hook breaks sqlalchemy connections
Apache Airflow Provider(s)
microsoft-mssql
Versions of Apache Airflow Providers
apache-airflow-providers-microsoft-mssql==4.4.2
Apache Airflow version
2.10.5
Operating System
RHEL 8.10
Deployment
Virtualenv installation
Deployment details
No response
What happened
Hi. the newly added feature in #44310 breaks existing connections if driver or encrypt parameters are used in the extras:
Probably more parameters may exhibit similar errors. These extras are needed by the MSSQL hook, but we cannot blindly accept these will work in pymssql connections.
Example connection:
{
"conn_type": "mssql",
"extra": {
"driver": "ODBC Driver 18 for SQL Server",
"encrypt": "yes",
"sqlalchemy_scheme": "mssql+pyodbc"
},
"host": "xxx",
"login": "xxx",
"password": "xxx",
"port": 1433,
"schema": "master"
}
Errors when using an SQLQueryExecutor using the above connection:
File "src/pymssql/_pymssql.pyx", line 586, in pymssql._pymssql.connect
TypeError: connect() got an unexpected keyword argument 'driver'
File "src/pymssql/_pymssql.pyx", line 586, in pymssql._pymssql.connect
TypeError: connect() got an unexpected keyword argument 'encrypt'
What you think should happen instead
These conenctions should continue working as they did in apache-airflow-providers-microsoft-mssql==3.9.2
How to reproduce
Have MSSQL connections defined with extra parameters
Install apache-airflow-providers-microsoft-mssql >= 4.0.0
Try an mssql hook using sqlalchemy
Anything else
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
cc @jx2lee
@eladkal I’ll take a look at it tomorrow!
@emredjan Thanks for the report! Planning to fix it soon — but happy to do Pull request yourself if it’s time-sensitive! I can assist
Thanks @jx2lee , it's not that urgent as we can use 3.9.2 without issues. I may have a look at fixing it, but not really sure what's the best way. Reverting the change, or adding an error checking for extras that are not supported in pymssql, or some other option..
@emredjan
Instead of expanding MsSqlHook to accept ODBC-specific options, why not lean on the hooks that already match each driver?
- pymssql →
MsSqlHook
The hook defaults to themssql+pymssqldialect and is tuned for that stack. - pyodbc →
OdbcHook
OdbcHookusespyodbcunder the hood and already exposes adriverparameter via connection extras/hook params, so no extra work is needed.
Keeping the hooks focused means:
- We avoid adding more branching logic to
MsSqlHook. - Users choose the hook that matches their driver, which is consistent with how other database providers work.
What’s your take? Assuming that's reasonable, would you be open to switching to OdbcHook and resolving this issue?
@jx2lee the issue reports a regression. If the current behavior is OK we should at least add entry in the changelog to explain users how to migrate.
the issue reports a regression. If the current behavior is OK we should at least add entry in the changelog to explain users how to migrate.
@eladkal Thanks for quick response. I believe this behavior is functioning as intended.
I'm not sure how to add the migration guidance to the changelog—Could someone direct me to the relevant docs or examples? If I have the resources, I can handle it.
Thanks!
@jx2lee It seems like using the ODBC connection type is the way to go. There are some implementation differences as they're used in common SQL operators, but it shouldn't take that much to migrate.
Thanks for the suggestions.
I'm not sure how to add the migration guidance to the changelog—Could someone direct me to the relevant docs or examples? If I have the resources, I can handle it.
As simple as add a warnning clause in the changelog.rst for the version that caused the regression and just write a short paragraph about how to mitigate the issue.
@eladkal Thanks for letting me know. I'll make PR, and assign this to me. 🙇🏽