dagster
dagster copied to clipboard
[DS][3/n] Update unique_id creation logic
Summary & Motivation
We need a way to uniquely identify a node in a AssetCondition tree. The main use case here is to be able to associate an in-memory AssetCondition with information about it that was serialized on the previous tick. In order to do this, we need to be able to have a shared unique id.
The previous scheme worked bottom-up, meaning that a condition's unique id was based on a hash of its properties and the properties of all of its children. This has an obvious problem, in that if you have two leaf nodes with identical properties, then they will have the same "unique" id.
This wasn't really a big deal in the previous world as it'd be very silly in most cases to have multiple leaf nodes with the same properties, but it's more realistic now, as if you do something like `~materialized & ~any_deps_match(~materialized), then you'll have two leaf nodes which are just "materialized".
So, the simple fix is to just make this a top-down process, where the parent unique id is an input into your unique id.
Note that this breaks a bit of backcompat logic, which could in theory (potentially) be fixed with a fair amount of effort. However, this backcompat logic is only used on the first tick of AMP after upgrading from dagster<1.5 to dagster>=1.5. Considering this transition happened months ago, and the consequences are fairly minimal (asset partitions which were requested, but failed, will be attempted again), I think this is a reasonable sacrifice. In fact, we can probably remove most of this backcompat stuff entirely in the near future.
How I Tested These Changes
-
#21511
: 2 dependent PRs (#21512
, #21520
)
-
#21510
-
#21508
-
#21507
-
#21505
-
#21504
-
#21503
-
#21502
-
#21501
👈
-
#21500
-
#21499
-
master
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @OwenKephart and the rest of your teammates on Graphite