pages-core
pages-core copied to clipboard
Monitor build activity
User Story
As a Federalist operator, I want to be aware of any issues with the builds
Background (Optional)
In addition, we should add monitoring/alerting for builds stuck in the tasked state.
Acceptance Criteria
- [ ] metrics for non-permanent build states are reported
- [ ] metrics for permanent build states are reported
- [ ] alert for non-permanent build state exists
- [ ] Change made live via deploy federalist web and metrics.
Level of effort - medium
Implementation outline (if higher than "low" effort):
- [ ] Add metrics for the count of builds in each non-permanent state (
created,queued,tasked,processing) - [ ] Add metrics for current max time in each non-permanent build state (we don't record the time for each state change, but we can use something like MAX("current time" -
updatedAt) for each state - [ ] Add a reasonable alert for the max time metrics
- [ ] Add metrics for the total builds in each permanent state (
error,success) within the collection window
I have some concerns about the best way to approach gathering build metrics. The metrics app is totally separate from the api. Bringing in the Sequelize models would drag along a whole bunch of dependencies that I'm not sure we want. We could try to duplicate or separate some of the logic but I'm not sure how well that would work. Other approaches include querying the data without Sequelize or fetching the data using the (admin) api, would require a bot user or other authentication mechanism.
thoughts @18F/federalist-admins ?
Maybe we pair this ticket down to only focus on the timeout logic. We can rethink, strategize, and break out a more succinct task to work on the build metrics.
Created https://github.com/18F/federalist/issues/3887, pairing this down to match