dagster
dagster copied to clipboard
Allow marking assets and partitions as materialized
What's the use case?
Sometimes an asset/partition can be materialized without Dagster knowing about it.
For example, an external entity can do it.
Wiping assets removes the materialization status but doesn’t actually remove the data. The requested feature would allow to quickly bring back accidentally wiped assets.
This might need some discussion. For example: should we allow user to attach output metadata manually? Probably other questions will arise.
Where this has come up
- https://dagster.slack.com/archives/C01U954MEER/p1659912078035859
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
Hey @danielgafni - is undoing wiped assets the main use for this that you have in mind? Or are there other situations where you imagine using this functionality?
Probably that's the main case
Hey! I just thought about a use case where this functionality would be crusial.
Let's say we are deploying a feature stage version of Dagster - e.g. someone opened a merge request into a repository with some dagster code and we want to test the new code on a separate Dagster instance deployed for this merge request.
It should be obvious now where this is going - the freshly deployed Dagster woukd have an empty events storage and would think the parent assets/partitions are not materialized. We somehow would need to mark them as materialized manually.
Hello dagsters,
I'm currently doing this using a job
and sending multiple context.log_event(AssetMaterialization(...))
.
My use case is to actually have a daily partitioned assets (5) spanning multiple years (about 10). Launching a backfill was not viable, takes too long since each day for each asset is a process w/ its overheads.
I have a job for each asset to the "manual" backfill in_process
(one process for each asset).
But for my use case, would be better to be able to launch an in_process
backfill (all partition execute inside one process).
Hey @deoqc - I think the issue where we're tracking what you're looking for is here: https://github.com/dagster-io/dagster/issues/8706.
Assuming I have a source which does not store history and I receive a daily CSV with its fill state/data:
- if data was loaded outside dagster I want to be able to tell dagit that the partitions are available. Otherwise, I run into problems with the freshness policy: https://dagster.slack.com/archives/C02LJ7G0LAZ/p1682980570278349?thread_ts=1682765167.292619&cid=C02LJ7G0LAZ
- if the source (for whatever reason) was unable to deliver its state CSV at a given date/partition I will never be able to fill back this partition. However I want to be able to somehow mark this partition as unreachable so the freshness policies a nd also the visualization of the asset will not reflect this (and no longer show warnings which are meaningless: it cannot be fixed)
As of Dagster 1.5.3, the Materialize button on the asset details page now includes a dropdown option to report materialization events:
If the asset is partitioned, you then have the option which partitions to report materializations for:
@sryza Is this possible in code? I'm trying to test assets with materialize()
and I don't want to materialize all upstream assets.
@archy-bold you can use the report_runless_asset_event
method on DagsterInstance