In 2025, around the time I transitioned from L5 SDE at Amazon to L6 (Senior SDE), I was looking online for resources to help me level up as a software engineer. I was hoping there would be a website where someone has read all the major books and could make recommendations based on experience, rather than me trying to navigate the sea of SEO-optimized websites that make up the Google search results. I never found such a page, but I did piece together an idea of what the most recommended books were over time after looking at enough different sites and videos. All that to say that this is my attempt to create such a page, the page that my younger self was looking for. The idea is I'll keep it updated over time as I read more and more. Maybe you'll find it useful.
My target audience for this page is an application developer who has two goals:
Prepare for system design interviews
Get better at designing software systems (for their job, personal projects, or otherwise)
I am not trying to give a holistic, unbiased, general review of any books on this page. Rather I am trying to give a biased, focused review from the perspective of a practitioner who's trying to get value out of these resources towards the two goals above.
I've looked at a lot of sources for learning system design now, and DDIA almost universally comes up as a recommended resource.
It has some chapters (particularly the early ones) that are high-level enough to be suitable for learning system design. Below is my summary of the most valuable chapters and content in those chapters. Beside each chapter is a star rating for how useful it is.
★★★ - Tons of great information. Must read
★★ - Very useful, read these next
★ - One or two good things, but getting into optional territory
No stars - Situationally useful. Come back to this when you know you need it.
Here they are:
★★★ Chapter 1: Trade-Offs in Data Systems Architecture
Sets the scene by describing the different kinds of systems that we're designing
Difference between OLTP systems (operational) and OLAP systems (analytical)
Difference between a system of record and a derived data system
Data warehouse
Data lake
Self-hosted vs cloud services
Distributed vs single machine
Separation of storage and compute
Microservices and Serverless
ETL
★ Chapter 2: Defining Non-Functional Requirements
This one is good but a bit abstract. It's best feature is that it gives you the right language for describing the things we care about in systems: reliability, scalability, fault-tolerance, and maintainability.
Shared-nothing architecture
★★★ Chapter 3: Data Models and Query Languages. Tons of great stuff:
Relational vs Document Models
What each are good for and when to use each
Normalization and Denormalization
Relationship Cardinality
Star and Snowflake Schemas for Analytics
Fact vs dimension tables
Graph query languages and data representation
Goes into a bit too much depth on graph query languages, you can skim that part. The main point is that certain queries are much more natural to express in a graph query language than in SQL.
Event sourcing and CQRS
★★ Chapter 4: Storage and Retrieval
Very good look under the hood at the high level ways different databases store data in different ways. While it's not something you'll likely implement unless you work on a database, it's valuable to understand nonetheless because it makes you understand why some databases are better suited for certain read-write patterns than others, and that's vitally important information when doing system design. You start with your read and write patterns and then work backwards from those to pick a DB. If you don't know what access patterns different DBs are built to try to support, then you'll have a hard time justifying a decision to use one over another. Very important for understanding efficiency.
LSM-Trees vs B-Trees
Indexes
Column-Oriented Storage
Full-text Search
Vector Databases
★ Chapter 5: Encoding and Evolution
This one is useful for real-world system design but I don't see it talked about in system design interviews as much.
Pros and cons of different data encodings: JSON, XML, Protocol Buffers, and Avro
Chapter 6: Replication
Replication is your main tool to achieve fault-tolerance.
I have read this chapter but I don't feel like I've understood it well enough to comment on it yet.
★ Chapter 7: Sharding
When your dataset no longer fits on a single machine, you need sharding: splitting your dataset across multiple nodes
I put this as only one star because in a system design interview, in my experience you're basically always going to need to design for scalability beyond a single machine, so you never truly need to make the decision to support sharding, it's made for you by the scale of your dataset in the problem. It's still valuable to know that sharding is necessary and how it works, but it's not really a trade-off, you either need it or you don't.
In terms of practical use, you need to know how to design your key for good distribution among shards, but you don't need to learn how to implement sharding unless you're building a new database service.
Chapter 8: Transactions
I haven't read this chapter yet, only the summary.
★ Chapter 9: The Trouble with Distributed Systems
It boils down to three main things:
Networks are unreliable
Clocks are not perfectly synchronized across machines
Processes may pause for unspecified amounts of time
As far as practicality for interviewing goes, it is useful to explain how you're going to deal with unreliable networks in your design, since that's an eternal problem. Clock skew is also worth discussing: your data models might want to make use of logical clocks rather than wall clock timestamps.
★★ Chapter 10: Consistency and Concensus
Strong Consistency (Linearizability) vs Eventual Consistency
Trade-off:
Eventual consistency
+ Higher availability (serve stale reads)
+ Performance (reduced latency)
− Stale reads
− More work at application level to deal with conflicting writes
Strong consistency
+ No stale reads
+ No write conflicts
− Performance (higher latency due to network delays for replication)
− Lower availability (return errors instead of stale reads)
Rule of thumb: if you are latency sensitive, require high availability, and can tolerate stale reads, use an eventually consistent database. If you are not latency sensitive and would rather recieve an error than a stale read, then use strong consistency.
It's important to know the consistency model your database supports when selecting one.
Logical clocks
Linearizable ID generators
Consensus can be achieved in many different ways. Shared logs are most common in practice.
Chapter 11: Batch Processing
Talks about how the problem of processing huge datasets across multiple machines was solved. Practically speaking though you just need to know to use Spark or something similar and it does all of the fancy fault-tolerance and distributed processing for you. Again, this is not really a decision you make so much as a reality you have to face if you're processing datasets too large to fit on a single machine.
Chapter 12: Stream Processing
There's some good stuff in here about log-based message brokers
Chapter 13: A Philosophy of Streaming Systems
Chapter 14: Doing the Right Thing
It's high-level and short. Worth reading but you won't get anything too concrete out of this. I was hoping there would be more specific references to laws like GDPR.
It covers every aspect of handling data, which is a large and critically important part of system design, but it doesn't cover everything to do with system design. For example, it does not cover API design.
Incredible source for references
Some chapters are more low-level than most application developers will ever need