dagster icon indicating copy to clipboard operation
dagster copied to clipboard

feat: asset scan & asset report

Open Tazoeur opened this issue 7 months ago • 3 comments

Summary & Motivation

Sometimes, for reasons I don't intend to question, data associated to an asset has been written without a proper dagster materialization.

The purpose of this PR is to provide CLI tools to allow users to:

  • mark some asset/partition as materialized
  • scan for non-materialized asset/partition and mark it as materialized if data is present

This motivation is baked by this issue #17202 and is linked to this discussion

Help Wanted

This PR shouldn't be accepted as it is. I am willing to continue working on it until it reaches the quality expected by the dagster team. But I'd like some guidance concerning (at least) the following topics:

  • I used rich.progress because I find it pretty, is there any preferred progress bar library that should be used?
  • I am using some of the private methods of the io manager. It is because I'd like the UPathIOManager.has_output to manage both partitioned and non-partitioned assets as I've written here: https://github.com/dagster-io/dagster/compare/master...Tazoeur:dagster:upath_io_get_path
  • Testing these features is valuable, but I am a bit lost concerning the path I should take to do that. Are there examples of tests that (a) manipulate storage data, (b) write files without using io manager, and then use the io manager to check that the file is present
  • Am I using the correct dagster instance and context ?
  • Should I split the PR in two, one for each new command?
  • Are the CLI names ok?
  • [new feature] the dagster asset scan could wipe a partition if the data associated with the materialized partition is missing. Should it be included in this PR?
  • [new feature] Should the scan also perform dagster checks to (in)validate the asset/partition?

Tazoeur avatar Jul 25 '24 08:07 Tazoeur