docs icon indicating copy to clipboard operation
docs copied to clipboard

Add Deployment Health Alerting Docs

Open niustephanie opened this issue 1 year ago • 0 comments

New Feature Description

Astro is adding four new alert types which correspond to four of our existing deployment health incidents. These new deployment health alerts allow customers to proactively get notified when deployment health issues arise, for example when the Airflow DB storage is unusually high. These alerts:

  • leverage Astro architecture to identify infra-level incidents that are otherwise hard to monitor
  • give users granular information (beyond Healthy / Unhealthy) about their deployments
  • incorporate Astro's perspective on deployment health into existing alerting workflows (email, Slack, Pagerduty) to allow faster response and minimize MTTR

Note: this feature will be in Private Preview starting July 31st. It will go into Public Preview after we've had a chance to get validation from Priv. Preview customers, likely mid-to-late August 2024.

image

Four new deployment alerts in the alert CRUD UI.

Docs this touches:

  • Astro Alerts- add four new trigger types and generalize the language around Astro alerts to include deployment health (and not just DAG and Task runs)
  • Deployment incidents: we can add a note that users can also set up alerts for some of these deployment incidents (and link to the Astro Alerts docs)
  • Best practices for Airflow vs. Astro alerts- we should include when to use deployment health alerts in our best practices guide for alerting

Required Reviewers

Paola, Ian Buss (Rob Shea in Ian's absence)

Links to Internal Info or Resources

More about the feature: https://www.notion.so/astronomerio/Supporting-Deployment-Health-Alerts-Feature-Overview-FAQ-1935f462dd75491db7d9f0a5ba77a2bc

Release Date

Private preview 7/31/2024 Public preview August 2024.

This is a relatively disruptive change (it has the potential to send customers emails and disrupt their existing workflows) so we plan for a slow and progressive rollout.

Additional Notes

No response

niustephanie avatar Jul 24 '24 16:07 niustephanie