airflow
airflow copied to clipboard
Fix parsing connection uri which contains a composite protocol sparated by :
closes: #33442
It seems like urlsplit fails to correctly parse the url when the protocol contains a colon character. As a workaround, this PR replaces the colon character in the protocol by a special string, and restore it when it needs to use the original value.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
I think this is just a band-aid and we need a more correct solution.
Thank you @potiuk for these helpful information, indeed the scheme could be delimited by :.
Could you check it after my last commit? I added a new condition to handle this case
I proposed a fixup that I think better reflects the different handling of the "://"-less URI
Also added handling (and testing) of special cases. I think we still have a group of users who might use 'localhost' or 'localhost:port' as their URI - even if it is technically wrong and they are not URIs. While the first one is not problematic, the second is, because if we treat it as a URI, technically it means:
schema = localhost
host=port
But I figured that we can simply check for those two with a simple regexp and add a "special treatment" for them. This is highly unlikely to have schema:9090 type of URI where the path is purely numeric.
I LOVE 🙃 the amount of hacks we have in such small piece of code....
This fix may not be necessary:
see comment https://github.com/apache/airflow/issues/33442#issuecomment-1817228790
Resolves issue #33442 by addressing an inconsistency in urlsplit, which fails to correctly parse URLs with colons in the protocol. This PR introduces a workaround by replacing the colon character in the protocol with a special string and restoring it when needed to use the original value.
Kindly review and provide feedback on this fix. Ensure adherence to the Pull Request Guidelines. Note that, for fundamental code changes, consider submitting an Airflow Improvement Proposal (AIP). Also, check for compliance with the ASF 3rd Party License Policy if there are any new dependencies.
In the case of potential backwards incompatible changes, please leave a note in a newsfragment file, following the naming convention {pr_number}.significant.rst or {issue_number}.significant.rst, within the newsfragments directory. Your attention to these considerations is appreciated.
https://www.rfc-editor.org/rfc/rfc3986#section-3.1
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
Having a colon in the scheme makes the URI invalid. Sure other tools can choose to give the string a meaning, but Airflow is under no obligation to support the semantic, especially since that semantic does not provide value to Airflow, as outlined by Daniel's comment above.
See, for example, ODBC hook, which allows you to specify the sqlalchemy scheme with sqlalchemy_scheme, either as hook init param, or as a connection extra.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.