paasta
paasta copied to clipboard
Alert authors of downstream jobs when a scheduled job has failed
I'm proposing that when a Job fails (either through the run failing, or the scheduler not running it as expected), we should notify all the owners of downstream jobs, too. I think we'll want to go all the way down the DAG. I think this is just an extension of the recent monitoring we added for when jobs didn't run on time, it's just that the reason is that an upstream job failed, rather than schedule.
this is more a RFC: @oktopuz @keymone @solarkennedy what do you think? If you agree, I'll work on a PR
What if we traverse the dag and gave that as part of the output of the alert to remind people what other jobs this blocks? (instead of alerting on every downstream job and flooding peoples inbox)
how often does it happen that downstream jobs belong to somebody other than owners of the root-set? also i think it's sufficient to notify on a failure of complete DAG rather than individually per DAG-node.
@keymone it was mentioned by one of the ads teams that they often had this issue.
@solarkennedy I'm fine with that as a less noisy solution.