Top Ideas in This Book
Restore service first. Then worry about deep diving into the problem.
However, there’s a catch. Restoring service often comes at the cost of not understanding the problem.
Your systems and machines are in a bad state. When restoring them to a good state, you often inhibit your ability to debug the issue. But that’s a trade-off you sometimes need to make.
Cascading failures occur when one system experiences an issue, subsequently causing issues in another system, potentially causing issues in yet another system. On and on we go.
In other words, cascading failures exist because of relationships between systems.
Nygard provides a lot of commentary on this topic throughout the book. Some of my favorite thoughts are:
Chaos Engineering is a good way to understand how resilient your systems are against cascading failures. Netflix’s Chaos Monkey famously shuts down random services and servers in production to test how dependent systems respond.
You can load balance, govern requests, shed load, fail fast, and do plenty more to mitigate risk. But fundamentally, you can’t control the volume of requests your system receives or the nature of those requests.
Temporary fixes often arise in two situations:
I give my engineers the benefit of any doubt. In my experience, engineers usually just forget about that temporary fix. They’re constantly bombarded with new problems consuming their attention.
Your engineers usually just need a reminder. As manager, when my team is done firefighting or prototyping, I like to ask what needs to be done to make the code production-worthy.
Then comes the hard part. You personally need to value the process of transforming a temporary fix into a production-worthy fix, and support that value in the face of mounting pressure from your product roadmap.
Rather than focusing on how efficient your developers are, focus on how efficiently work moves through your process.
Focusing on process and not people can feel counter-intuitive for managers. Aren’t we managers of people? Yes, but your job is to make work possible and that often means addressing process failures.
Diagramming your value delivery chain is a good place to start. The DevOps Handbook provides insight on how exactly to map your value delivery chain, but the basic idea is to list every step in your process and how long that step takes – both in execution time and total process time. For instance, code review may only require 5 minutes of execution, but can often take hours or days to perform.
Value delivery chain diagrams usually help identify a few limiting factors that slow your release frequency and increase lead times – two of the key engineering performance metrics tracked in Accelerate. Some common areas I’ve seen slowing teams from releasing are:
How strongly do I recommend Release It!?
7 / 10
To my surprise based on the title, this book was not about increasing release frequency – a primary interest of mine in building high performance engineering teams. However, this was a pleasant surprise.
Release It! is a good read for both engineers and engineering managers, particularly product engineers that also take on systems and DevOps responsibilities. The author smartly ties technical recommendations and details to broader concepts and themes.