TestConnectionOperator
Given the default Test Connection method in the Airflow UI being disabled, I find myself frequently writing similar code to this, while implementing a "Canary DAG" patterns
Additionally - as an operator, there is the added benefit of a standard method of testing connections that may be defined via other mechanisms, and having a clear procedure that can be easily recommended to new users to help debug connection woes and test things like networking configurations.
Remaining todo:
- [ ] docs
- [ ] tests
having a clear procedure that can be easily recommended to new users to help debug connection woes and test things like networking configurations.
Why not just use the operator they actually want to use?
I am not convinced about this feature
@eladkal my thoughts:
- establishing/testing connections is something I see Airflow users struggle with constantly, and it can often be pretty frustrating/obscure what is or isn't working, especially for someone new to Airflow. Isolating just the connection can simplify the process and should be encouraged and easy
- giving users the tools to spend less time debugging and make a difficult process easy is useful
- the canary pattern wants to exercise the connection in a meaningful but non-impactful way (e.g.
select 1;notselect * from my_table limit ...). It isn't always immediately obvious how to do that, when establishing a connection to a system that a user may only have partial familiarity with, or users may make a disruptive test by accident- this could also be a great
@setuptask, or something to encourage at the start of a DAG, e.g. to see if the database actually is up and exists
- this could also be a great
- The functionality for
test_connectionis already there for many hooks/connection types, and includes the correct tests, so I'd say "D.R.Y.". Why make users reimplement a user-friendly feature? - not all connections are in the Airflow UI (e.g. a secrets backend), so this has additional benefit for testing those connections, and the ability to retrieve them. The pattern also functions as a way to document what connections exist and are functional for the Airflow instance
- that codepath is often disabled / unusable via the AF Connections UI. Often users don't even know that exists, nor why it's disabled, nor how to enable it or the implications of enabling it instance-wide. I'd say this is good for security while preserving a friendly UX that had been previously exposed
One change in this from the test_connection function is that the original doesn't actually fail, when invoked via a task, which is why this is modified to raise.
If .test_connection had the option to raise, an alternative could be publishing/documenting:
@task
def canary(conn_id):
return BaseHook.get_hook(conn_id=self.conn_id).test_connection(raise=True)
(though my vote would still be "why expect new users to know this random snippet, or find it in a doc or blog post?" Something off the shelf feels more friendly and useful, especially given how connections are one of the most core pieces of Airflow and are a pretty rough-edged UX at the moment)
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.
This is essentially
@task
def test_connection(conn_id):
ok, status = BaseHook.get_hook(conn_id=conn_id).test_connection()
if ok:
return status
raise RuntimeError(status)
I feel what we really need is to document this pattern better, maybe provide the snippet as a recipe. It seems like an overkill for me adding a dedicated operator class.