Chaos Engineering is grounded in empiricism, experimentation over testing, and verification over validation. But not all experimentation is equally valuable. The principles of Chaos Engineering extend to a “gold standard” captured in a set of advanced principles.
The Birth of Chaos The beginning of Chaos Engineering goes back to 2008 when Netflix moved from the datacenter to the cloud. The move didn’t go as planned. The thinking at the time was that the datacenter locked them into an architecture of single points of failure, like large databases, and vertically scaled components. Moving […]
I was hired at Netflix to lead the Traffic Team in early 2015. A few weeks later I was also asked to charter a Chaos Engineering Team. At the time, Chaos Engineering was essentially a program called Chaos Monkey with a few supporting blog posts. I wanted to get a feel for what our engineers […]
Many people use the words Verification and Validation interchangeably, which risks the ability to focus on system-level behaviors that correspond to business value. We prefer to use definitions inspired by the field of Operations Research Management. Verification is finding congruence between what you expect from a system and the actual output. Validation is finding congruence between an […]
One of my favorite topics of discussion within the domain of Availability is mythology. Not dragons and unicorns, which would be undeniably cool, but myths in the sense of made-up stories we tell ourselves to explain things that we don’t understand. There are many things that we as an industry tell ourselves about the nature […]
Root Cause Analysis (RCA), a common practice throughout the software industry, does not provide any value in preventing future incidents in complex software systems. Instead, it reinforces hierarchical structures that confer blame, and this inhibits learning, creativity, and psychological safety. In short, RCA is an inhumane practice. Fortunately, there are healthy alternatives to RCA. I’ll […]
Three years from now, if you are pushing code into a serious production environment, it will go through a Continuous Verification (CV) pipeline. The software industry’s transition into complex systems is accelerating. The humans designing, building, and operating these complex systems are no longer capable of understanding how all of the pieces fit together. It’s […]