dagster icon indicating copy to clipboard operation
dagster copied to clipboard

Allow marking assets and partitions as materialized

Open danielgafni opened this issue 2 years ago • 3 comments

What's the use case?

Sometimes an asset/partition can be materialized without Dagster knowing about it.

For example, an external entity can do it.

Wiping assets removes the materialization status but doesn’t actually remove the data. The requested feature would allow to quickly bring back accidentally wiped assets.

This might need some discussion. For example: should we allow user to attach output metadata manually? Probably other questions will arise.

Where this has come up

  • https://dagster.slack.com/archives/C01U954MEER/p1659912078035859

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

danielgafni avatar Jul 07 '22 23:07 danielgafni

Hey @danielgafni - is undoing wiped assets the main use for this that you have in mind? Or are there other situations where you imagine using this functionality?

sryza avatar Jul 08 '22 00:07 sryza

Probably that's the main case

danielgafni avatar Jul 11 '22 07:07 danielgafni

Hey! I just thought about a use case where this functionality would be crusial.

Let's say we are deploying a feature stage version of Dagster - e.g. someone opened a merge request into a repository with some dagster code and we want to test the new code on a separate Dagster instance deployed for this merge request.

It should be obvious now where this is going - the freshly deployed Dagster woukd have an empty events storage and would think the parent assets/partitions are not materialized. We somehow would need to mark them as materialized manually.

danielgafni avatar Aug 04 '22 14:08 danielgafni

Hello dagsters,

I'm currently doing this using a job and sending multiple context.log_event(AssetMaterialization(...)).

My use case is to actually have a daily partitioned assets (5) spanning multiple years (about 10). Launching a backfill was not viable, takes too long since each day for each asset is a process w/ its overheads.

I have a job for each asset to the "manual" backfill in_process (one process for each asset).

But for my use case, would be better to be able to launch an in_process backfill (all partition execute inside one process).

deoqc avatar Nov 11 '22 12:11 deoqc

Hey @deoqc - I think the issue where we're tracking what you're looking for is here: https://github.com/dagster-io/dagster/issues/8706.

sryza avatar Nov 14 '22 16:11 sryza

Assuming I have a source which does not store history and I receive a daily CSV with its fill state/data:

  • if data was loaded outside dagster I want to be able to tell dagit that the partitions are available. Otherwise, I run into problems with the freshness policy: https://dagster.slack.com/archives/C02LJ7G0LAZ/p1682980570278349?thread_ts=1682765167.292619&cid=C02LJ7G0LAZ
  • if the source (for whatever reason) was unable to deliver its state CSV at a given date/partition I will never be able to fill back this partition. However I want to be able to somehow mark this partition as unreachable so the freshness policies a nd also the visualization of the asset will not reflect this (and no longer show warnings which are meaningless: it cannot be fixed)

geoHeil avatar May 02 '23 03:05 geoHeil

As of Dagster 1.5.3, the Materialize button on the asset details page now includes a dropdown option to report materialization events:

image

If the asset is partitioned, you then have the option which partitions to report materializations for:

image

sryza avatar Oct 13 '23 17:10 sryza

@sryza Is this possible in code? I'm trying to test assets with materialize() and I don't want to materialize all upstream assets.

archy-bold avatar Oct 27 '23 12:10 archy-bold

@archy-bold you can use the report_runless_asset_event method on DagsterInstance

sryza avatar Oct 27 '23 14:10 sryza