Recommended Software Engineering Resources

My target audience for this page is a software engineer or manager who has two goals:

Prepare for system design interviews
Get better at designing software systems (for their job, personal projects, or otherwise)

I am not trying to give a holistic, general review of any books on this page. Rather I am trying to give a focused review from the perspective of a practitioner who's trying to get value out of these resources towards the two goals above.

Books
- Designing Data-Intensive Applications (2nd edition) by Martin Kleppmann
- System Design Interview (vol. 1) by Alex Xu
Websites
- System Design Primer
- roadmap.sh
Tools
- Anki

Books

Designing Data-Intensive Applications (2nd ed.)

Designing Data-Intensive Applications cover

I've looked at a lot of sources for learning system design now, and DDIA almost universally comes up as a recommended resource.

It has some chapters (particularly the early ones) that are high-level enough to be suitable for learning system design. Below is my summary of the most valuable chapters and content in those chapters. Beside each chapter is a star rating for how useful it is.

★★★ - Tons of great information. Must read
★★ - Very useful, read these next
★ - One or two good things, but getting into optional territory
No stars - Situationally useful. Come back to this when you know you need it.

Here they are:

★★★ Chapter 1: Trade-Offs in Data Systems Architecture

Sets the scene by describing the different kinds of systems that we're designing

Difference between OLTP systems (operational) and OLAP systems (analytical)
Difference between a system of record and a derived data system
Data warehouse
Data lake

Self-hosted vs cloud services
Distributed vs single machine
Separation of storage and compute
Microservices and Serverless
ETL

★ Chapter 2: Defining Non-Functional Requirements

This one is good but a bit abstract. It's best feature is that it gives you the right language for describing the things we care about in systems: reliability, scalability, fault-tolerance, and maintainability.
Shared-nothing architecture

★★★ Chapter 3: Data Models and Query Languages. Tons of great stuff:

Relational vs Document Models

What each are good for and when to use each

Normalization and Denormalization
Relationship Cardinality
Star and Snowflake Schemas for Analytics

Fact vs dimension tables

Graph query languages and data representation

Goes into a bit too much depth on graph query languages, you can skim that part. The main point is that certain queries are much more natural to express in a graph query language than in SQL.

Event sourcing and CQRS

★★ Chapter 4: Storage and Retrieval

Very good look under the hood at the high level ways different databases store data in different ways. While it's not something you'll likely implement unless you work on a database, it's valuable to understand nonetheless because it makes you understand why some databases are better suited for certain read-write patterns than others, and that's vitally important information when doing system design. You start with your read and write patterns and then work backwards from those to pick a DB. If you don't know what access patterns different DBs are built to try to support, then you'll have a hard time justifying a decision to use one over another. Very important for understanding efficiency.
LSM-Trees vs B-Trees
Indexes
Column-Oriented Storage
Full-text Search
Vector Databases

★ Chapter 5: Encoding and Evolution

This one is useful for real-world system design but I don't see it talked about in system design interviews as much.
Pros and cons of different data encodings: JSON, XML, Protocol Buffers, and Avro

Chapter 6: Replication

Replication is your main tool to achieve fault-tolerance.
I have read this chapter but I don't feel like I've understood it well enough to comment on it yet.

★ Chapter 7: Sharding

When your dataset no longer fits on a single machine, you need sharding: splitting your dataset across multiple nodes
I put this as only one star because in a system design interview, in my experience you're basically always going to need to design for scalability beyond a single machine, so you never truly need to make the decision to support sharding, it's made for you by the scale of your dataset in the problem. It's still valuable to know that sharding is necessary and how it works, but it's not really a trade-off, you either need it or you don't.
In terms of practical use, you need to know how to design your key for good distribution among shards, but you don't need to learn how to implement sharding unless you're building a new database service.

Chapter 8: Transactions

I haven't read this chapter yet, only the summary.

★ Chapter 9: The Trouble with Distributed Systems

It boils down to three main things:

Networks are unreliable
Clocks are not perfectly synchronized across machines
Processes may pause for unspecified amounts of time

As far as practicality for interviewing goes, it is useful to explain how you're going to deal with unreliable networks in your design, since that's an eternal problem. Clock skew is also worth discussing: your data models might want to make use of logical clocks rather than wall clock timestamps.

★★ Chapter 10: Consistency and Concensus

Strong Consistency (Linearizability) vs Eventual Consistency

Trade-off:

Eventual consistency

+ Higher availability (serve stale reads)
+ Performance (reduced latency)
− Stale reads
− More work at application level to deal with conflicting writes

Strong consistency

+ No stale reads
+ No write conflicts
− Performance (higher latency due to network delays for replication)
− Lower availability (return errors instead of stale reads)

Rule of thumb: if you are latency sensitive, require high availability, and can tolerate stale reads, use an eventually consistent database. If you are not latency sensitive and would rather recieve an error than a stale read, then use strong consistency.
It's important to know the consistency model your database supports when selecting one.

Logical clocks
Linearizable ID generators
Consensus can be achieved in many different ways. Shared logs are most common in practice.

Chapter 11: Batch Processing

Talks about how the problem of processing huge datasets across multiple machines was solved. Practically speaking though you just need to know to use Spark or something similar and it does all of the fancy fault-tolerance and distributed processing for you. Again, this is not really a decision you make so much as a reality you have to face if you're processing datasets too large to fit on a single machine.

Chapter 12: Stream Processing

There's some good stuff in here about log-based message brokers

Chapter 13: A Philosophy of Streaming Systems
Chapter 14: Doing the Right Thing

It's high-level and short. Worth reading but you won't get anything too concrete out of this. I was hoping there would be more specific references to laws like GDPR.

It covers every aspect of handling data, which is a large and critically important part of system design, but it doesn't cover everything to do with system design. For example, it does not cover API design.

Incredible source for references
Some chapters are more low-level than most application developers will ever need

System Design Interview (vol. 1)

Useful collection of examples, but the downside is that each one is quite focused on the given domain. Need to balance with learning lower level principles that are more widely applicable.
Lots of diagrams
References to further reading for each example

Websites

System Design Primer

System Design Primer is a great resource. It's very wide in scope and offers links for depth. My favourite parts are the table of real-world architectures and the table of engineering blogs.

roadmap.sh

roadmap.sh is a website that gives you a high level visual "roadmap" for learning a particular area, such as backend. I find having a high-level view really helpful since I can quickly scan through and see what I know and what I don't. You can click on the items you want to learn about and read a short explanation, with a few links to deeper dives if you need.

Tools

Anki

Anki is flash card software. You can download clients for desktop and mobile, but you can also just access via a browser on ankiweb.net. Anki is how I made sure I could actually recall what I was learning, days, weeks, or months later. Use the other resources to learn, use Anki to make it stick.

I recommend creating your own flash cards as you study the other resources. The act of writing your own cards helps you learn the material. A few tips for creating cards that you'll actually stick with:

Keep cards small. If an answer has more than two pieces of information, you're going to hate sitting there trying to think of the sixth item in a bullet list, and you're going to give up on studying because it's too painful. Split cards until they're small enough. You should be able to answer or fail a card in a few seconds. That keeps you moving, and getting a bunch of cards right is more motivating than sitting for a long time thinking and getting one card right.
Create 2-3 cards for each idea/fact. This helps you really solidify your understanding by considering it from a few different angles. For example, if you just learned about idempotency keys and you're creating cards for it, you could create these three instead of just one:

Q: How do you prevent an operation from being applied twice on a remote system in case of a network fault? A: Include an idempotency key in the request. The service stores it in durable storage and checks if it's already been processed.
Q: What is an idempotency key? A: A piece of data attached to a request that the server can use to prevent an operation from executing twice.
Q: What is a good use case for idempotency keys? A: Prevent double charges in a payments system.

Date	Revision
2026-06-13	Remove motivation paragraph. Add Anki section. Add roadmap.sh section. Add revisions table. Added bullets for System Design Interview vol 1
2026-05-02	Initial revision