dagster icon indicating copy to clipboard operation
dagster copied to clipboard

[backfill daemon run retries 1/n] update how we determine backfill completion to account for retried runs

Open jamiedemaria opened this issue 1 year ago • 6 comments

Summary & Motivation

The backfill daemon doesn't account for run retries. See https://github.com/dagster-io/internal/discussions/12460 for more context. We've decided that we want the daemon to account for automatic and manual retries of runs that occur while the backfill is still in progress. This requires two changes: ensuring the backfill isn't marked completed if there is an in progress run or a failed run that will be automatically retried; and updating the daemon to take the results of retried runs into account when deciding what partitions to materialize in the next iteration.

This PR addresses the first point, ensuring the backfill isn't marked completed if there is an in progress run or a failed run that will be automatically retried.

Currently a backfill is marked complete when all targeted asset partitions are in a terminal state (successfully materialized, failed, or downstream of a failed partition). Since failed runs may be retried, there is a case where all asset partitions are in a terminal state, but there is a retry in progress that could change the state of some asset partitions. This means that if there are any runs in progress for the partition we need to wait for them to complete before marking the backfill complete.

Additionally, we need to account for a race condition where a failed run may have a retry automatically launched for it, but the daemon marks the backfill complete before the retried run is queued. This PR adds an additional check to ensure that no failed runs are about to be retried.

How I Tested These Changes

new unit tests

jamiedemaria avatar Nov 06 '24 19:11 jamiedemaria

This stack of pull requests is managed by Graphite. Learn more about stacking.

jamiedemaria avatar Nov 06 '24 19:11 jamiedemaria

Deploy preview for dagster-university ready!

✅ Preview https://dagster-university-5hid137gc-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster-university.dagster-docs.io

Built with commit bf580e462f146e09ae9f22ed28b4e1390c048fcf. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Nov 11 '24 20:11 github-actions[bot]

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io

Direct link to changed pages:

  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/concepts/assets/asset-checks/define-execute-asset-checks
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/concepts/metadata-tags/kind-tags
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/concepts/testing
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/guides/migrations/from-step-launchers-to-pipes
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/guides/running-dagster-locally
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations/airlift
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations/airlift/reference
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations/airlift/tutorial
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations/looker
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations/spark
  • https://dagster-docs-4a8ut7u30-elementl.vercel.app https://jamie-backfill-daemon-termination-change.dagster.dagster-docs.io/integrations/tableau

github-actions[bot] avatar Nov 11 '24 20:11 github-actions[bot]

Deploy preview for dagit-storybook ready!

✅ Preview https://dagit-storybook-kx2qczuyi-elementl.vercel.app https://jamie-backfill-daemon-termination-change.components-storybook.dagster-docs.io

Built with commit bf580e462f146e09ae9f22ed28b4e1390c048fcf. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Nov 11 '24 20:11 github-actions[bot]

Deploy preview for dagit-core-storybook ready!

✅ Preview https://dagit-core-storybook-pw23urgfv-elementl.vercel.app https://jamie-backfill-daemon-termination-change.core-storybook.dagster-docs.io

Built with commit bf580e462f146e09ae9f22ed28b4e1390c048fcf. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Nov 11 '24 20:11 github-actions[bot]

@clairelin135 @gibsondan bumping for review (and the stacked pr)!

jamiedemaria avatar Nov 14 '24 14:11 jamiedemaria

@gibsondan this is ready for review now that the retry tag changes have landed!

jamiedemaria avatar Dec 02 '24 18:12 jamiedemaria