CowboyRobot writes "ACM has an article about how Netflix conducts its resilience testing. Instead of the GameDays used by sites such as Amazon and Google, Netflix uses what they call The Simian Army, based on the philosophy that 'Resilience can be improved by increasing the frequency and variety of failure and evolving the system to deal better with each new-found failure, thereby increasing anti-fragility.' While GameDay exercises are like a fire-drill, with scheduled exercises where failure is manually introduced or simulated, the Simian Army relies on failure in the live environment induced by autonomous agents known as 'monkeys.' Chaos Monkey randomly terminates virtual instances in a production environment that are serving live customer traffic. Chaos Gorilla causes an entire Amazon Availability Zone to fail. And Chaos Kong will take down an entire region of zones. 'What doesn't kill you makes you stronger' and Netflix hopes that by constantly protecting itself from internal onslaught, they will become increasingly 'anti-fragile — growing stronger from each successive stressor, disturbance, and failure.'"