lilypad
lilypad copied to clipboard
[core] Alerting when services go down
Figure out what would be the best way for us to learn when the solver, job creator and/or chain services stop working for whatever reason.
We have started this effort with:
- [x] EC2 status checks
- [x] Cloudflare tunnel alerts when tunnel goes down
Next steps:
- [ ] Implement Cloudflare tunnel alerts in OpenTofu
- [ ] Add a periodic
cowsayjob to check network liveness - [ ] Alerts from our observability stack once live