dagster
dagster copied to clipboard
snowflake freshness helper
This PR introduces a standalone method to the dagster-snowflake package that, given a list of tables and a project ID, returns the freshness of each table.
How I tested this Added a test that creates an observable source asset that attaches freshness as metadata to its observe result, and asserts the information is correct on the resulting observation event.
Tl;dr, standalone function seems the most composable to me.
@sryza it's a bit confusing, but the SnowflakeConnection
object we return from calling SnowflakeResource.get_connection
is not the same as the one we define in code. From the documentation of our own SnowflakeConnection class:
Note that the SnowflakeConnection is only used by the snowflake_resource. The Pythonic SnowflakeResource does not use this SnowflakeConnection class.
So putting it on the SnowflakeConnection
object isn't what we want, I think. The pythonic resource SnowflakeResource.get_connection
method returns a snowflake.connector.SnowflakeConnection
object which we of course have no control over (and it doesn't even always return that).
I considered putting it on the SnowflakeResource pythonic resource object as a class method, but that also seems weird, because we don't know the patterns of how users are going to open and close connections. IE, consider a pattern like so:
@observable_source_asset
def my_asset(snowflake: SnowflakeResource):
with snowflake.get_connection(...) as conn:
conn.execute("Some custom thing")
update_timestamps = snowflake.get_latest_update_timestamps(...)
To me that pattern feels weird, while the following feels more natural
@observable_source_asset
def my_asset(snowflake: SnowflakeResource):
with snowflake.get_connection(...) as conn:
conn.execute("Some custom thing")
update_timestamps = get_latest_update_timestamps(conn, ...)
It feels to me most composable to allow users to potentially open one connection, call out to this method to retrieve freshness, and potentially do some other stuff, all without needing to open multiple context managers, whereas a method on the resource would have to open it's own connection directly, irrespective of what connections might already be opened by the user.
is dagster_snowflake.SnowflakeConnection kind of essentially a deprecated API now that we endorse SnowflakeResource?
yeah. We decided that with the pythonic snowflake resource we wanted to just provide snowflake's SnowflakeConnection
(we'd gotten a lot of feature requests about the behavior in our SnowflakeConnection
, and felt that giving direct access to the underlying snowflake.SnowflakeConnection would basically solve all of the issues)
I'm not sure we can formally mark it as deprecated though. I'm not sure if all old-style resources can be marked deprecated yet and we probably can't mark our SnowflakeConnection
class deprecated until then
@sryza I found this surprising, but we don't actually have a TimestampMetadataValue, as of now. I can add one though if we think it's important. To unblock the demo I wonder if that should be a fast follow.
Ah interesting. Yes I think worth adding one. For the demo timeframe, thoughts on having this API return datetimes and the demo code convert it to floats before passing it to the metadata dict? Then we wouldn't need to change this API later.
Good point. Momentarily forgot that this was only a utility function. I'll change to using datetimes.