dagster icon indicating copy to clipboard operation
dagster copied to clipboard

[AMP] Store asset-scoped previous evaluation state in the daemon cursor storage table

Open smackesey opened this issue 1 year ago • 1 comments

Summary & Motivation

For AMP sensors, store the previous evaluation state for each asset separately in the daemon cursor storage table. The previous behavior was to group the evaluation states for all assets in one big list and store it on the sensor cursor, which is stored as part of the "instigator data" in the sensor table. Storing individual asset cursors individually allows each asset's previous evaluation state to be read regardless of which AMP sensor it is part of, which addresses a performance issue where costly "recalculation from scratch" would occur previously if an asset was moved between sensors.

How it works:

  • A serialized instance of AssetDaemonCursor is still in sensor storage, but the previous_evaluation_state field is emptied prior to serialization/storage after a tick.
  • When loading a serialized AssetDaemonCursor, the AssetConditionEvaluationState for each asset is loaded independently and merged into the AssetDaemonCursor that will be used in biz logic. Likewise, when storing the AssetDaemonCursor after a tick, these are broken out and stored separately.

Other comments

  • I'm not thrilled with the current design where we "patch" the loaded AssetDaemonCursor from storage with the individual evaluation states loaded from elsewhere, but this entire area is already a bit of a mess with all the different sensor formats. Since AssetDaemonCursor now represents the merge of data from different tables, we should probably stop storing it directly in the sensor instigator data. Need to think about this more.
  • This is a performance-focused change, but I haven't tested that. Seeking suggestions from reviewers on how to go about testing perf here-- do we have an existing framework for this kind of thing.

How I Tested These Changes

Existing test suite.

smackesey avatar Mar 04 '24 16:03 smackesey

  • #20238 Graphite 👈
  • #12115 Graphite
  • #10983 Graphite: 1 other dependent PR (#12112 Graphite)
  • #12102 Graphite
  • master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @smackesey and the rest of your teammates on Graphite Graphite

smackesey avatar Mar 04 '24 16:03 smackesey