dagster icon indicating copy to clipboard operation
dagster copied to clipboard

snowflake freshness helper

Open dpeng817 opened this issue 1 year ago • 6 comments

This PR introduces a standalone method to the dagster-snowflake package that, given a list of tables and a project ID, returns the freshness of each table.

How I tested this Added a test that creates an observable source asset that attaches freshness as metadata to its observe result, and asserts the information is correct on the resulting observation event.

dpeng817 avatar Feb 13 '24 01:02 dpeng817

Current dependencies on/for this PR:

  • #19759 Graphite 👈
  • #19698 Graphite: 4 other dependent PRs (#19691 Graphite, #19700 Graphite, #19718 Graphite and 1 other)
  • #19695 Graphite
  • master

This stack of pull requests is managed by Graphite.

dpeng817 avatar Feb 13 '24 01:02 dpeng817

Tl;dr, standalone function seems the most composable to me.

@sryza it's a bit confusing, but the SnowflakeConnection object we return from calling SnowflakeResource.get_connection is not the same as the one we define in code. From the documentation of our own SnowflakeConnection class:

Note that the SnowflakeConnection is only used by the snowflake_resource. The Pythonic SnowflakeResource does not use this SnowflakeConnection class.

So putting it on the SnowflakeConnection object isn't what we want, I think. The pythonic resource SnowflakeResource.get_connection method returns a snowflake.connector.SnowflakeConnection object which we of course have no control over (and it doesn't even always return that).

I considered putting it on the SnowflakeResource pythonic resource object as a class method, but that also seems weird, because we don't know the patterns of how users are going to open and close connections. IE, consider a pattern like so:

@observable_source_asset
def my_asset(snowflake: SnowflakeResource):
    with snowflake.get_connection(...) as conn:
        conn.execute("Some custom thing")
        update_timestamps = snowflake.get_latest_update_timestamps(...)

To me that pattern feels weird, while the following feels more natural

@observable_source_asset
def my_asset(snowflake: SnowflakeResource):
    with snowflake.get_connection(...) as conn:
        conn.execute("Some custom thing")
        update_timestamps = get_latest_update_timestamps(conn, ...)

It feels to me most composable to allow users to potentially open one connection, call out to this method to retrieve freshness, and potentially do some other stuff, all without needing to open multiple context managers, whereas a method on the resource would have to open it's own connection directly, irrespective of what connections might already be opened by the user.

dpeng817 avatar Feb 14 '24 02:02 dpeng817

is dagster_snowflake.SnowflakeConnection kind of essentially a deprecated API now that we endorse SnowflakeResource?

yeah. We decided that with the pythonic snowflake resource we wanted to just provide snowflake's SnowflakeConnection (we'd gotten a lot of feature requests about the behavior in our SnowflakeConnection, and felt that giving direct access to the underlying snowflake.SnowflakeConnection would basically solve all of the issues)

I'm not sure we can formally mark it as deprecated though. I'm not sure if all old-style resources can be marked deprecated yet and we probably can't mark our SnowflakeConnection class deprecated until then

jamiedemaria avatar Feb 14 '24 15:02 jamiedemaria

@sryza I found this surprising, but we don't actually have a TimestampMetadataValue, as of now. I can add one though if we think it's important. To unblock the demo I wonder if that should be a fast follow.

dpeng817 avatar Feb 14 '24 21:02 dpeng817

Ah interesting. Yes I think worth adding one. For the demo timeframe, thoughts on having this API return datetimes and the demo code convert it to floats before passing it to the metadata dict? Then we wouldn't need to change this API later.

sryza avatar Feb 14 '24 22:02 sryza

Good point. Momentarily forgot that this was only a utility function. I'll change to using datetimes.

dpeng817 avatar Feb 14 '24 22:02 dpeng817