ideas icon indicating copy to clipboard operation
ideas copied to clipboard

Add "Stale Query CSV" capabilities

Open jqnatividad opened this issue 7 years ago • 1 comments

With the new Stored Queries functionality, complex SQL queries can be run against the Datastore and stored as snapshots.

However, it seems that there is no mechanism to ensure the snapshot is up-to-date. As I understand it, automatic refresh was not implemented as a conscious design decision so as not to trigger cascading updates that will negatively impact performance.

It would be nice if:

  • there is a "Stale Query CSV" indicator in the UI that will prompt data publishers to refresh a query
  • have the ability in the UI to click through the parent resources referred to in the SQL
  • there is an API for helping manage the Query CSVs - to find out stale CSVs, to find out the query dependencies, to trigger a query CSV refresh.

cc @wardi

jqnatividad avatar Apr 16 '18 20:04 jqnatividad

A "Stale Query CSV" indicator should be easy to implement as part of the dataset search results and dataset view pages. I'm assuming a global configuration option for what age constitutes stale. Only showing the indicator to users with permission to update the query would be a bit more work (and a performance hit if displayed in search results)

UI to click through parent resources is straightforward. We already collect the table names involved in a query. Only tricky bit is building a nice interface for the list of resources.

The APIs exist: package_search on datasets with query resources, and datastore_create to re-run queries. Need to be careful about recording which user to run the queries though. If we run as a sysadmin user this could expose data that wasn't intended when one of the permissions on a parent table changes. I'm not sure how best to record that user information.

wardi avatar Apr 20 '18 13:04 wardi