dagster icon indicating copy to clipboard operation
dagster copied to clipboard

AutoMaterializePolicy Skip If Already Running

Open C0DK opened this issue 1 year ago • 6 comments

What's the use case?

We have non-partitioned assets that currently utilize an incremental practice - this means that sometimes the job runs for quite a while, (especially if historical data is being loaded in).

This means that the auto materialization policy might start multiple jobs (we are currently backfilling huge quantities so it's running for quite a while), and suddenly our 10 job slots are filled by the same slot.

Similarly, a non-partitioned job doesn't make sense to run when it's already running. However, it doesn't seem to be possible to configure the MaterializationPolicy for this specific thing.

Alternatively, I also wanted to use max_concurrent_runs to simply specify that this asset cannot run more than one at a time, however, it's currently not possible to set concurrency limits on an asset specifically (only on jobs or similar.. sadly).

I would argue, however, that it'd probably almost always be the case that one doesn't want to have the asset auto-materialize the same partition (or asset, if not partitioned) if an auto-materialization is already running on that specific asset.

Ideas of implementation

I assume this is non-trivial, as one has to look into the state of runs; nevertheless, there is the ability to skip runs if a backfill is running; I assume there might be some synergy between the two.. maybe?

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

C0DK avatar Feb 05 '24 09:02 C0DK

Would love for this to be prioritized, or a temporary fix proposed. It sorta results in bugs/annoyance with duplicate runs and wasted resources.

C0DK avatar Feb 19 '24 15:02 C0DK

@C0DK looks like the skip_on_run_in_progress rule is available in the latest release

mattfysh avatar Mar 28 '24 12:03 mattfysh

I want to reopne this, as all our assets in question are partitioned.

        if context.partitions_def is not None:
            raise DagsterInvariantViolationError(
                "SkipOnRunInProgressRule is currently only support for non-partitioned assets."
            )

however the feature is nifty never the less

C0DK avatar Apr 22 '24 13:04 C0DK

Just to add; our eager auto materialization started the same partition a lot of times, as we have a partitioned asset that relies on a non-partitioned asset, that updates rather quickly. We can filter that away as 'dependencies' or something, but we want them to update; just not instantly.

C0DK avatar Apr 24 '24 08:04 C0DK

image Just a clear example. The last N materializations are on the same partition, because our queue is full, and therefore the partition isn't updated before the next tick, where the materialization is still missing, however multiple runs have been requested.

C0DK avatar Apr 24 '24 10:04 C0DK

I was thinking whether one could simply remove the raise in the code? wouldn't that basically just mean that one cannot auto materialize ANY partition while ANY partition is in progress. i know it's not ideal, but it's 'better', and will just mean a delay of any subsequent partition updates.

To make it very explicit one could require the usage of an all_partitions=True argument. I've created a (potential) solution here: https://github.com/dagster-io/dagster/pull/21415 - would love feedback.

C0DK avatar Apr 25 '24 05:04 C0DK