YOW2016
YOW2016 copied to clipboard
References for YOW2016
Architectural patterns of resilient distributed systems
Accompanying repository for the "Architectural patterns of resilient distributed systems" talk given at YOW 2016. Feel free to open any issues for questions and/or to say hi :)
Talk Outline
See the image credits, link to slides, and video-soon.
- Why Resilience
- Motivation & Definitions
- Resilience Literature
- Harvest/Yield thinking
- Cook's Model
- Borrill's Model
- Resilience in industry
- Netflix
- Fastly
- Conclusions
If you want to see a longer version of this talk see the Strangeloop 2015 version.
References
Resilience literature
- Checklist to remember
- Difference Between Harvest and Yield
- Harvest, Yield, and Scalable Tolerant Systems
- Computer Immunology - Burgess
- Building Robust Systems an essay - Sussman
- How Complex Systems Fail - Cook
- Optimal Design, Robustness, and Risk Aversion
- Part Count and Design of Robust Systems
- Highly Optimized Tolerance: A Mechanism for Power Laws in Designed Systems
- Fault Tolerance and the Five-Second Rule
- Scale free Networks - computerworld
- The Scale-free property - Barabási
- Scale-free network
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Failure Sketches: A Better Way to Debug
- Virtual Network Diagnosis as a Service
- ‘Going solid’: a model of system dynamics and consequences for patient safety
- Building on Quicksand
- Immutability Changes Everything
- You can't sacrifice partition tolerance
- Complex adaptive system
- Robustness principle
- Small-world experiment
Resilience in industry
- Principles of Chaos Engineering
- Fault tolerance in a high-volume distributed system
- From Chaos to Control - Testing the resiliency of Netflix’s Content Discovery Platform
- Making the Netflix API More Resilient
- Flux: A New Approach to System Intuition
- Chaos Engineering Upgraded
- Google Finds: Centralized Control, Distributed Data Architectures Work Better Than Fully Decentralized Architectures
- Caitie's Runbook template
- Clients are Jerks: aka How Halo 4 DoSed the Services at Launch & How We Survived
- Game Day Exercises at Stripe: Learning from kill -9
- How we ended up with microservices
- Postmortem for July 27 outage of the Manta service
- Hashicorp Yamux
- The Chubby lock service for loosely-coupled distributed systems
- Summary of the Amazon DynamoDB Service Disruption and Related Impacts in the US-East Region
- Notes on Distributed Systems for Young Bloods
Media
- Velocity NY 2013: Richard Cook, "Resilience In Complex Adaptive Systems"
- Developing a Globally Distributed Purging System and slides
- Joao Taveira's SRECon talk - Scaling Networks through Software
- Complex Adaptive Systems: 13 Robustness & Resilience
- Network Theory: 16 Robustness & Resilience
- Design of Resilient Systems - Innovations in Thinking Differently
- Camille Fournier's Papers We Love Talk on The Chubby lock service for loosely-coupled distributed systems and slides
- Distributed Chaos Operations: Casey Rosenthal, Netflix
- "Building Scalable Stateful Services" by Caitie McCaffrey
About Reading Papers
- Papers we love
- Starting a PWL chapter
- The Morning Paper
- ACM queue Research for Practice
- The Morning Paper Quarterly Review Issue 1
Thank you!
Thank you to everyone who helped with feedback/resources and advice for this talk. Special thanks to: Paul Borrill, Jordan West, Caitie McCaffrey, Camille Fournier, Mike O'Neill, Neha Narula, Matt Whiteley, Joao Taveira, Tyler McMullen, Zac Duncan, Nathan Taylor, Ian Fung, Armon Dadgard, Peter Alvaro, Peter Bailis, Alex Rasmussen, Bruce Spang, Aysulu Greenberg, Elaine Greenberg, and Greg Bako.