Case Studies
Each of the systems below has appeared in dozens of interviews. The point isn't to memorize a "right answer" — it's to see how the building blocks compose into a real design, and to internalize the tradeoffs.
Design a URL shortener (bit.ly)
Core requirements: short URL ↔ long URL mapping, redirect, analytics.
Sketch:
- Hash the long URL or use a base-62 encoded counter for the short code.
- Store mappings in a key-value store (DynamoDB, Cassandra) — read-heavy, eventually consistent is fine.
- Cache hot URLs in Redis.
- Async write to a separate analytics pipeline (Kafka → batch processor) for click tracking.
Tradeoffs to mention: collision handling for hashes, whether codes are reusable, how to expire URLs.
Design a news feed (Twitter/Facebook)
Core requirements: post a tweet, follow users, see your feed in (near) real-time.
Sketch:
- Fanout-on-write: when user A tweets, push to every follower's feed. Fast reads, expensive writes — bad for celebrities.
- Fanout-on-read: when user A loads their feed, query each followee. Slow reads, cheap writes.
- Hybrid: fanout-on-write for normal users, fanout-on-read for celebrities. Most real systems do this.
Building blocks: Cassandra for tweets, Redis for feeds, Kafka for fanout, ML service for ranking.
Design a chat app (WhatsApp/Messenger)
Core requirements: send messages, deliver in real-time, work offline, support group chats.
Sketch:
- WebSocket or long-poll for real-time delivery; fall back to push notifications when offline.
- Per-user message queue. Fan out to every group member on send.
- Store messages in a wide-column store (Cassandra) keyed by
(conversation_id, timestamp). - For end-to-end encryption: clients hold keys, server only routes opaque payloads.
Tradeoffs: read receipts vs privacy, message ordering in groups (vector clocks), media attachments via object storage + CDN.
Design a video streaming service (YouTube/Netflix)
Core requirements: upload, transcode, stream at multiple resolutions.
Sketch:
- Upload to object storage (S3).
- Async transcoding pipeline (Kafka → workers) outputs HLS/DASH segments at multiple bitrates.
- Serve via CDN with adaptive bitrate streaming.
- Recommendations and search via separate services backed by Elasticsearch and an ML pipeline.
Tradeoffs: storage cost vs catalog size, hot vs cold content tiers, regional CDN strategy.
Design a ride-sharing service (Uber)
Core requirements: match riders to nearby drivers, real-time ETA, payments.
Sketch:
- Geospatial index (Geohash, Quadtree, or Redis GEO) for nearby driver lookup.
- WebSocket connections for both riders and drivers, pushing location updates.
- Dispatch service runs the matching algorithm — usually some variant of Hungarian or greedy with constraints.
- Payments via a separate service with idempotent operations and a write-ahead log.
Tradeoffs: matching latency vs match quality, handling driver churn during a ride, surge pricing fairness.
Design a rate limiter
Core requirements: limit each user to N requests per minute, work across multiple servers, low latency.
Sketch:
- Token bucket is the most common algorithm — each user has a bucket of N tokens, refilled at a fixed rate. Each request consumes one.
- For multi-server, store buckets in Redis with atomic decrement (
DECR) and TTL. - Fail open vs fail closed during a Redis outage is a tradeoff worth raising explicitly.
Tradeoffs to mention: precision (fixed window vs sliding window vs token bucket), distributed clock skew, what happens to in-flight requests during a config change.
Design a notification system
Core requirements: send push, email, and SMS to millions of users; support batching and templating; honor user preferences.
Sketch:
- Producers publish notification events to Kafka.
- A fanout service expands an event into per-user, per-channel deliveries.
- Each channel has its own worker pool that calls the underlying provider (APNs, FCM, SES, Twilio).
- A user-preference service is consulted before fanout; failed deliveries go to a retry queue with exponential backoff.
Tradeoffs to mention: at-least-once vs exactly-once delivery, throttling per provider, deduplication of identical notifications, handling provider outages gracefully.
How to study case studies
Don't read passively. Pick one, set a 45-minute timer, and design it from scratch on a whiteboard. Then read the canonical solution and diff it against yours. The diff is where the learning happens.