system-design-and-architecture
system-design-and-architecture copied to clipboard
Availability and Resilience
Syed X, [Nov 18, 2019 at 3:10:51 AM]:
Hello All,
Has anyone worked on Data center consolidation, upgrade projects with an emphasis on Availability and Resilience requirements?
I have a potential interview coming up and would need the below items for the preparation.
- Resilience strategies
- HA strategies
- DR strategies
- Most important - challenges faced and how they were addressed.
Thanks in advance
Though all of those concepts of Fault Tolerance, High Availability, Disaster Recovery are improving availability, they are slightly different http://www.pbenson.net/2014/02/the-difference-between-fault-tolerance-high-availability-disaster-recovery/
failover: https://tianpan.co/notes/85-improving-availability-with-failover
Resilience strategies: I am not an expert on this but I guess the book "antifragile" answers the principles of it. netflix chaos monkey.
HA: https://en.wikipedia.org/wiki/High-availability_cluster
DR: FB TAO did replication pretty impressive https://tianpan.co/notes/49-facebook-tao, I assume some google papers specify even better solutions
challenges: failure is always an option. I guess building a system (people+machine) that handles failures automatically and escalates properly is the most challenging part because it is not just an engineering problem but also management problem.