Safety and Security at Speed

By Tiffany Knudtson | September 24, 2019

12 minute read

For the 2019 OWASP Global AppSec DC event, which is the largest application security conference on the planet, I brought the MEASURE framework in the form of a talk. One of the goals of the talk was to bring together the worlds of Safety and Security. I believe that Safety fits in the MEASURE framework for DevSecOps and is part of the adoption of DevSecOps. The MEASURE framework guides the team when choosing where to put resources when traveling on their DevSecOps journey to modernize security and deliver value.

ONE THING THAT EVERY BUSINESS CRAVES: SAFETY AT SPEED

In other words, MEASURE is the roadmap for unfurling “The Digital Transformation” across the information security team. This makes security part of the value chain and not a casual bystander to the company’s objectives and goals. This new direction for security will bring the one thing that every business craves: safety at speed.

A DevSecOps Tale of Business, Engineering, and People from James Wickett

The talk is split into two parts, the first half recounts the Crash at Crush, a famous train crash that happened in 1896 just outside of West, Texas. It was a staged crash that was initiated by William Crush as a publicity stunt for the railroad. Over 40,000 people came to witness the “Clash of the Iron Monsters.” It was a well-planned event with safety precautions taken for the well-being of the crowd, and engineers inspecting the stability of the boilers.

And it would have gone down in the books as a successful event other than the issue of the crash leading to cascading failures. Both boilers exploded. Several people died and many were injured.

Through the research of that incident, there are three findings that I found particularly interesting to resilience engineering and DevSecOps.

Root Cause is a Myth
Breaches or Failures Won’t Stop Business
Experimentation and Learning are Critical

Root Cause is a Myth

Readers of the blog will be familiar with our view that root cause is inhumane but moreover, root cause can’t be determined in complex socio-technical systems. As is oft said, Root Cause is a Myth. And that sentiment was certainly the case with the incident in Crush, Texas. Immediately after the incident, William Crush was fired, but only a few days later he was rehired. The railroad realized that every precaution that could be taken was and that the organization as a whole was responsible, not just one individual. It is a good example of understanding the social and organizational factors that go into failures.

Instead of choosing blame and finger-pointing when breaches happen, DevSecOps practitioners should seek shared understanding and following blameless retrospective procedures to look at a wider picture of how the event actually unfolded. We shouldn’t fire the engineer who didn’t apply the patches, nor the CISO who hired the engineer. Instead, we look at what organizational decisions contributed to the breach.

Breaches or Failures Won’t Stop Business

One of the most curious aspects of the Crash at Crush is that instead of banning demolition with trains and the whole enterprise disappearing, it actually sparked a period of excitement to doing more events. Even though people died and injuries were sustained in the crash, entrepreneurs saw new revenue opportunities. A crowd of 40,000 today is big, but think of what that meant over a hundred years ago. In the following years, dozens of train demolitions were held across the country and ironically, none of them experienced boiler explosions.

The DevSecOps takeaway is that we often fool ourselves into believing that a breach or incident will require behavior change. We tell ourselves once the breach happens, then certainly we will change, we will make new security policies and actually follow them this time, or we will stop just blindly ‘accepting the risk’ when making business decisions. This type of thinking is categorically false and we need to make changes now and not wait for a forcing event like a breach to happen.

Experimentation and Learning are Critical

Being able to stress component parts would have helped, but we have evidence the engineers investigated the boilers as thoroughly as they could from the exterior. As a developer, we could say that the unit tests passed but failed in real-world stress experimentation. As we say in IT often, there is no replacement for production. However, if they had a spare set of identical trains and ran a simulation, maybe they would have discovered the likelihood that the boilers could fail.

FOSTER AN ENVIRONMENT BASED ON EXPERIMENTING AND LEARNING

In a DevSecOps model, one of the critical things to do is set up and foster an environment based on experimentation and learning. This helps security engineers to defend against attacks that the organization actually receives rather than referencing a top ten list. Experimentation and learning also bring feedback loops of security information to developers building the applications. Experimentation feedback loops form a bridge that puts dev and sec and ops together where they treat software as a team sport. They are able to build in safety and security at the speed of development.