dagster
dagster copied to clipboard
[DS][29/n] Create ParentNewerCondition
Summary & Motivation
Originally, this was conceived as a condition that would be used as part of an any_deps_match()
sort of context. There are a few problems with this:
- All other conditions are simply true or false for a given asset partition, regardless of which child_partition we consider to be the "downstream". e.g. an asset partition is unambiguously missing or not missing. However, if you have A -> B with A unpartitioned and B partitioned, then it's possible for A to be newer than some partitions of B but older than others. This means that if we wanted to evaluate something like:
any_deps_match(missing() | newer_than_child())
, then we would have to evaluate the inner expression of missing | newer_than_child on a per-child-partition basis, which is both expensive and extremely challenging to visualize - We want to eventually implement this via the StaleStatusResolver, which fits way better with the mental model of "a child partition can have the status (parent is newer)" rather than "a child partition can have the status (any parent has the status (newer than child partition))"
The downside is that this does not compose as nicely with other conditions, but I think this is actually ok. For example, we don't support "all parents newer" out of the box, but I think this is a good thing, as that sort of behavior is actually a bit of a footgun, which leads to weird synchronization issues if you (e.g.) materialize one of the two parents, then manually materialize the child. Now you're on a weird new cadence in which your child will not materialize after the second parent updates.
For most cases in which you want to wait for more than one parent to update, you're either filling in time partitions as they come in (and so you can just wait for no parent partitions to be missing, rather than all parent partitions to update), or you want to materialize an unpartitioned asset regularly (in which you can wait for all parent partitions to be updated since a given cron tick, rather than waiting for them all to be updated more recently than you have updated).
How I Tested These Changes
-
#21641
: 2 dependent PRs (#21670
, #21671
) 👈
-
#21640
-
#21648
-
#21615
-
#21613
-
#21612
-
#21573
-
#21546
-
#21545
-
#21541
-
#21540
-
#21539
-
#21538
-
#21537
-
#21536
-
#21535
-
#21521
-
#21520
-
#21511
: 1 other dependent PR (#21512
)
-
#21510
-
#21508
-
#21507
-
#21505
-
#21504
-
#21503
-
#21502
-
#21501
-
#21500
-
#21499
-
master
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @OwenKephart and the rest of your teammates on Graphite
All caps too aggressive aesthetically imo
On Tue, May 14 2024 at 9:43 AM, OwenKephart < @.*** > wrote:
@.**** commented on this pull request.
In python_modules/dagster/dagster/_core/asset_graph_view/asset_graph_view.py ( https://github.com/dagster-io/dagster/pull/21641#discussion_r1600069555 ) :
- def compute_asset_partitions(self) -> AbstractSet[AssetKeyPartitionKey]:
return
self._compatible_subset.asset_partitions +
Ok I updated the name to compute_asset_partitions_EXPENSIVE , which I think is a sufficient deterrent for now, and we can move this discussion to a future diff
— Reply to this email directly, view it on GitHub ( https://github.com/dagster-io/dagster/pull/21641#discussion_r1600069555 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AG3IK6KLIK7SC35FNPNQAY3ZCIIH7AVCNFSM6AAAAABHF7TBCOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDANJVGQZDIMJXGY ). You are receiving this because your review was requested. Message ID: <dagster-io/dagster/pull/21641/review/2055424176 @ github. com>
Merge activity
- May 14, 10:38 AM EDT: @OwenKephart started a stack merge that includes this pull request via Graphite.
- May 14, 10:47 AM EDT: Graphite rebased this pull request as part of a merge.
- May 14, 10:49 AM EDT: Graphite rebased this pull request as part of a merge.
- May 14, 10:50 AM EDT: @OwenKephart merged this pull request with Graphite.