awesome-k8s-lessons-learned icon indicating copy to clipboard operation
awesome-k8s-lessons-learned copied to clipboard

Curated list of awesome lessons learned running Kubernetes

Lessons learned/problems/incidents when running Kubernetes

Write-up

  • 2019 - Tinder - Tinder’s move to Kubernetes - https://medium.com/tinder-engineering/tinders-move-to-kubernetes-cda2a6372f44

  • 2018 - ReactiveOps - The ReactiveOps “Bestest Kubernetes Cluster Upgrade” - https://medium.com/@reactiveops/the-reactiveops-bestest-kubernetes-cluster-upgrade-f7a7589b21fb

  • 2018 - Gravitational - The Horrors of Upgrading Etcd Beneath Kubernetes - https://gravitational.com/blog/kubernetes-and-offline-etcd-upgrades/

  • 2018 - MEE6 - Scaling Kubernetes for 25M users - https://medium.com/@brendanrius/scaling-kubernetes-for-25m-users-a7937e3536a0

  • 2017 - GitHub - Kubernetes at GitHub - https://githubengineering.com/kubernetes-at-github/

  • 2017 - Saltside - Our Failure Migrating to Kubernetes - https://engineering.saltside.se/our-failure-migrating-to-kubernetes-25c28e6dd604

  • 2017 - Saltside - Migrating to Kubernetes: Day 20 Problems - https://engineering.saltside.se/migrating-to-kubernetes-day-20-problems-fbbda4905c23

  • 2017 - Stripe - Learning to operate Kubernetes reliably - https://stripe.com/blog/operating-kubernetes

  • 2017 - Tailor Brands - Production grade Kubernetes on AWS (series) - https://medium.com/tailor-tech/production-grade-kubernetes-on-aws-primer-5b83e71c024

  • 2017 - Applatix - Making Kubernetes Production Ready (series) - https://applatix.com/making-kubernetes-production-ready/

  • 2017 - Daniel Martins - Pain(less) NGINX Ingress - https://danielfm.me/posts/painless-nginx-ingress.html

Talk

  • 2019 - Spotify - Keynote: How Spotify Accidentally Deleted All its Kube Clusters with No User Impact - https://www.youtube.com/watch?v=ix0Tw8uinWs

  • 2018 - Financial Times - Keynote: The Challenges of Migrating 150+ Microservices to Kubernetes - https://www.youtube.com/watch?v=H06qrNmGqyE

  • 2018 - Monzo Bank - Anatomy of a Production Kubernetes Outage (incident available in section Incidents/Outages) - https://www.youtube.com/watch?v=OUYTNywPk-s

Incident/Outage

  • 2017 - Monzo Bank - Kubernetes bug ate my banking app! - https://community.monzo.com/t/resolved-current-account-payments-may-fail-major-outage-27-10-2017/26296/95