How strongly do I recommend Designing Data-Intensive Applications?
7 / 10
I was hoping this book would be a collection of design patterns for engineering data systems. However, Designing Data Intensive Applications works at a much lower level. For instance, in this book you can learn the technical details and differences of conflict resolution across multiple database systems.
At Amazon, the customers with the slowest page request times are the customers with the most data. In other words, the customers with the most purchases and highest LTV (life time value).
We experience this uneven distribution challenge every day at TrainHeroic. Coaches and athletes with the most training data also experience the slowest API response times. An athlete with 6 years of training data, hundreds of sessions performed, and millions of pounds lifted is going to require more data processing and transfer than a new user.
But the question isn’t about relative speed, it’s about requirements and acceptable speed. High value users (especially!) need fast, reliable responses. Those are the users you need to test with and benchmark against.
That’s why Amazon looks at the top 0.01% of users and what speeds those high LTV users experience – not the average user.
Of course, decreasing your average response time is a good sign. Recently we performed a PHP upgrade and experienced a 30% improvement in average API response time. But you can’t take averages as gospel.
In the early days of the NoSQL movement, much hype centered around the idea that NoSQL databases were the secret scaling sauce.
Turns out there is no magic. No one size fits all. No recipe. No panacea.
In most cases what helps you scale is highly specific to the application you’re building.
Scaling is one area where senior and highly technical engineers help because seniors engineers excel at pattern matching and applying concepts from one problem area to another.
This book is particularly good for senior engineers looking to extend their pattern matching skills in data storage and processing.
Code is easy to refactor or rewrite. Changing code is computationally inexpensive. Not true with data.
Data outliving code is a good problem to have, usually reserved for successful companies.
In development and testing, use the old stuff. Legacy data. Data that precedes your current conceptual model. Data with weird cases, different encodings, and unexpectedly null values.