Add job notifier - allow running jobs on failure
This PR implements a powerful new feature that allows running custom jobs when other jobs fail, providing an easy and flexible way to implement custom notifiers and cleanup procedures.
What's Changed
New Job Notifier Type
Jobs can now run other jobs when they fail using a new job notifier type. This approach provides a clean, explicit API that integrates seamlessly with the existing notification system:
notifiers:
cleanup-notifier:
type: job
job: cleanup-job
jobs:
backup-job:
schedule: "0 2 * * *"
tasks: ["backup-database"]
on_error: ["email-admin", "cleanup-notifier"] # Mix regular and job notifiers
cleanup-job:
schedule: "0 0 1 1 1" # Never run on schedule
enabled: false # Only triggered by failures
tasks: ["cleanup-temp-files", "send-slack-alert"]
Key Features
- Powerful Failure Handling: Run complex cleanup tasks, custom notifications, or recovery procedures when jobs fail
- Infinite Loop Prevention: Job notifiers run target jobs with error notifications disabled to prevent cascading failures
- Graceful Degradation: Missing target jobs are logged as warnings but don't crash the system
-
Full Backward Compatibility: Existing
on_errornotifiers continue to work unchanged - Clear API: Job notifiers are explicit and consistent with the existing notification architecture
-
Flexible Use Cases:
- Custom notification systems (Slack, Teams, etc.)
- Cleanup of temporary files or resources
- Recovery procedures and rollback operations
- Alerting systems with rich context
Implementation Details
The implementation adds minimal, surgical changes to the codebase:
-
New Notifier Type: Added
jobnotifier type that wraps and executes other jobs - Schema Extension: Added job notifier schema with validation
- Job Resolution: Job notifiers can reference other jobs through a resolver pattern
-
Safety Measures: Target jobs run with
notifyError=falseto prevent infinite loops -
Clean API: No confusing dual properties - everything goes through the standard
on_errorarray
Example Use Cases
Database Backup with Cleanup:
notifiers:
backup-cleanup-notifier:
type: job
job: backup-cleanup
jobs:
db-backup:
schedule: "0 2 * * *"
tasks: ["create-backup"]
on_error: ["backup-cleanup-notifier"]
backup-cleanup:
enabled: false
tasks: ["remove-partial-backup", "alert-ops-team"]
Application Deployment with Rollback:
notifiers:
rollback-notifier:
type: job
job: rollback-deployment
jobs:
deploy-app:
schedule: "0 18 * * 1" # Monday 6 PM
tasks: ["deploy-to-prod"]
on_error: ["email-ops", "rollback-notifier"]
rollback-deployment:
enabled: false
tasks: ["revert-deployment", "notify-dev-team"]
This feature transforms ckron from a simple task scheduler into a robust workflow orchestration tool, enabling sophisticated failure handling patterns while maintaining the simplicity and consistency of the existing API.
Fixes #102.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
@copilot on_error and on_failure are synonymous. That makes the API very confusing. Why don't you implement a notifier class that wraps a job instead. I think disabling sub-notifications for failure jobs is smart but I wish it was more explicit. I feel like some users will expect the sub notifications to trigger as well, maybe we should throw an error for now to prevent confusion.
@copilot on_error and on_failure are synonymous. That makes the API very confusing. Why don't you implement a notifier class that wraps a job instead. I think disabling sub-notifications for failure j...
You're absolutely right about the API confusion between on_error and on_failure. I've refactored the implementation to use a job notifier approach instead.
Changes made:
- Removed the
on_failureproperty entirely - Created a new
jobnotifier type that wraps and runs other jobs - Updated the API so failure jobs are configured as notifiers in the
on_errorarray - This makes the behavior explicit and consistent with the existing architecture
New usage:
notifiers:
cleanup-notifier:
type: job
job: cleanup-job
jobs:
main-job:
on_error: ["email-admin", "cleanup-notifier"] # Mix regular and job notifiers
This addresses all your concerns:
- No more confusing dual properties
- Job execution is explicit (it's clearly a notifier)
- Notification behavior is obvious (job notifiers run with
notifyError=false) - More flexible (can mix job notifiers with regular notifiers)
Commit: cac581a