dagster
dagster copied to clipboard
feat: asset scan & asset report
Summary & Motivation
Sometimes, for reasons I don't intend to question, data associated to an asset has been written without a proper dagster materialization.
The purpose of this PR is to provide CLI tools to allow users to:
- mark some asset/partition as materialized
- scan for non-materialized asset/partition and mark it as materialized if data is present
This motivation is baked by this issue #17202 and is linked to this discussion
Help Wanted
This PR shouldn't be accepted as it is. I am willing to continue working on it until it reaches the quality expected by the dagster team. But I'd like some guidance concerning (at least) the following topics:
- I used
rich.progress
because I find it pretty, is there any preferred progress bar library that should be used? - I am using some of the private methods of the io manager. It is because I'd like the
UPathIOManager.has_output
to manage both partitioned and non-partitioned assets as I've written here: https://github.com/dagster-io/dagster/compare/master...Tazoeur:dagster:upath_io_get_path - Testing these features is valuable, but I am a bit lost concerning the path I should take to do that. Are there examples of tests that (a) manipulate storage data, (b) write files without using io manager, and then use the io manager to check that the file is present
- Am I using the correct dagster instance and context ?
- Should I split the PR in two, one for each new command?
- Are the CLI names ok?
- [new feature] the
dagster asset scan
could wipe a partition if the data associated with the materialized partition is missing. Should it be included in this PR? - [new feature] Should the scan also perform dagster checks to (in)validate the asset/partition?