docs
docs copied to clipboard
Add Deployment Health Alerting Docs
New Feature Description
Astro is adding four new alert types which correspond to four of our existing deployment health incidents. These new deployment health alerts allow customers to proactively get notified when deployment health issues arise, for example when the Airflow DB storage is unusually high. These alerts:
- leverage Astro architecture to identify infra-level incidents that are otherwise hard to monitor
- give users granular information (beyond Healthy / Unhealthy) about their deployments
- incorporate Astro's perspective on deployment health into existing alerting workflows (email, Slack, Pagerduty) to allow faster response and minimize MTTR
Note: this feature will be in Private Preview starting July 31st. It will go into Public Preview after we've had a chance to get validation from Priv. Preview customers, likely mid-to-late August 2024.
Four new deployment alerts in the alert CRUD UI.
Docs this touches:
- Astro Alerts- add four new trigger types and generalize the language around Astro alerts to include deployment health (and not just DAG and Task runs)
- Deployment incidents: we can add a note that users can also set up alerts for some of these deployment incidents (and link to the Astro Alerts docs)
- Best practices for Airflow vs. Astro alerts- we should include when to use deployment health alerts in our best practices guide for alerting
Required Reviewers
Paola, Ian Buss (Rob Shea in Ian's absence)
Links to Internal Info or Resources
More about the feature: https://www.notion.so/astronomerio/Supporting-Deployment-Health-Alerts-Feature-Overview-FAQ-1935f462dd75491db7d9f0a5ba77a2bc
Release Date
Private preview 7/31/2024 Public preview August 2024.
This is a relatively disruptive change (it has the potential to send customers emails and disrupt their existing workflows) so we plan for a slow and progressive rollout.
Additional Notes
No response