dataform icon indicating copy to clipboard operation
dataform copied to clipboard

Delete unreferenced tables in a schema

Open lewish opened this issue 5 years ago • 2 comments

There should be a way to tell Dataform to remove any tables from schemas that no longer exist.

This is complicated by the fact that a run may not always contain all of the created tables/views:

  • When using e.g --tags to run only a subset of tables
  • When compiling with an environment, and having two environments that write to the same schema

As a solution to this, we can compute the tables to drop from the compiled graph. We can also for now make the assumption that any database + schema written to by a dataform action is considered "dataform managed".

In the future, we can allow the project to specify which database schemas are considered managed explicitly.

Proposal

Add a new action dataform prune that given the same args as compile, creates a list of all tables and views in any database schema that is written to by the current compiled graph that will not be created by the current compiled graph.

When called with dataform prune --drop, the command should remove all of the listed tables.

lewish avatar Jan 07 '20 16:01 lewish

@lewish did you ever find a temporary workaround for this by any chance? (eg a specific script you wrote to do the same job until it's available natively)

hobailey avatar Sep 14 '23 08:09 hobailey