ol-infrastructure
ol-infrastructure copied to clipboard
Infrastructure automation code for use by MIT Open Learning
# User Story - As an Open edX operator I would like to reduce the total amount of state that we need to keep around for compliance, cost, and security...
# User Story Currently, Micromasters uses hand constructed CloudFront distributions. Building these is error prone and high risk. # Description/Context We should move Micromasters to use Pulumi managed Fast.ly as...
# User Story Periodically, we get errors like this from OVS on production: https://sentry.io/organizations/mit-office-of-digital-learning/issues/1606816357/?environment=production-apps&project=194353&query=is%3Aunresolved When these errors happen, video transcode jobs fail, like these: [DEBUG:Error posting your video "L10v2.mp4"](https://odl.zendesk.com/agent/tickets/128851) [DEBUG:Error...
# User Story - Right now we get quite a few Spike protection emails from Sentry. A quick scan indicates ballpark 20 a week. - This is too many. We...
# User Story - I want to know when opensearch clusters / nodes are getting low on space and be alerted about it. # Description/Context Should alert when EITHER the...
We don't always know if a docker container that isn't in the 'critical path' has stopped running. For instance, if a traefik or application container crashes, the aws lb will...
# User Story - As an engineer I would like to be aware of operational issues in the applications/services that I build - As a product owner I would like...
@blarghmatey recommended opsdroid: https://opsdroid.dev/ We need this to be able to: - Act as an interface between Slack and the concourse pipelines for release management - Perform the tasks that...
The goal for this epic is to implement a revised release management process per the updated handbook: http://mitodl.github.io/handbook/delivering/release-process/ To start, we'll do the following for one of our applications and...