Designing a URL Shortener on AWS: From Zero to Production

Building a URL shortener is one of the most popular system design interview questions — and for good reason. It touches almost every fundamental: hashing, database choice, caching, rate limiting, analytics, and horizontal scaling.

In this post, we'll design a URL shortener that can handle 10,000 requests per second and store 10 billion URLs, deployed entirely on AWS.

Requirements

Functional

Given a long URL, return a short URL (7-character code)
Given a short URL, redirect to the original long URL
Support custom aliases (e.g. go.co/launch)
Track click analytics (count, location, device)
URLs expire after a configurable TTL

Non-Functional

Read-heavy: 100:1 read/write ratio
99.99% availability for redirects
<50ms p99 redirect latency
Globally distributed (optimised for India + US)

Capacity Estimation

Let's work through the numbers:

Write rate:      100 URLs/sec → 8.64M URLs/day
Read rate:       10,000 redirects/sec
Storage per URL: ~500 bytes (URL + metadata)
Storage 5 years: 8.64M × 365 × 5 × 500B ≈ 7.9TB
Cache memory:    80% traffic hits 20% URLs (Pareto)
                 10K RPS × 0.2 × 500B ≈ 1GB hot data

URL Shortening Strategy

The most critical decision is how we generate the 7-character short code.

Option 1: MD5/SHA256 Hash

Hash the long URL, take the first 7 characters:

Python

import hashlib
import base64

def shorten(long_url: str) -> str:
    digest = hashlib.md5(long_url.encode()).digest()
    encoded = base64.urlsafe_b64encode(digest).decode()
    return encoded[:7]

Problem: Hash collisions become likely at scale. With a 7-char Base62 code (62^7 = 3.5 trillion combinations), collisions appear around the birthday paradox boundary (~56K URLs).

Option 2: Counter-Based (Our Choice)

Use a global counter and convert to Base62:

Java

public class Base62Encoder {
    private static final String CHARS =
        "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

    public static String encode(long id) {
        StringBuilder sb = new StringBuilder();
        while (id > 0) {
            sb.insert(0, CHARS.charAt((int)(id % 62)));
            id /= 62;
        }
        // Pad to 7 characters
        while (sb.length() < 7) sb.insert(0, 'a');
        return sb.toString();
    }
}

For distributed counter generation without a single point of failure, we use range-based allocation: each application server is pre-allocated a range of IDs (e.g., 1–10M, 10M–20M) from a Redis counter. When a server exhausts its range, it claims the next.

Architecture on AWS

Why DynamoDB?

Key-value access pattern: We always look up by short code (partition key)
Auto-scaling: Handles 10K+ reads/sec without manual intervention
TTL support: Built-in expiration via the ttl attribute
Global Tables: Multi-region replication for <50ms latency worldwide

TypeScript

// DynamoDB item schema
interface ShortUrl {
  pk: string;           // short code (partition key)
  longUrl: string;      // original URL
  userId: string;       // owner
  clickCount: number;   // atomic counter
  ttl: number;          // Unix timestamp for expiry
  createdAt: string;    // ISO 8601
  customAlias: boolean; // was it a custom alias?
}

Redirect Service with Lambda@Edge

For the hot redirect path, we use Lambda@Edge — it runs at CloudFront edge nodes, meaning the redirect logic executes within ~10ms of the user:

JavaScript

// Lambda@Edge Viewer Request handler
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const shortCode = request.uri.slice(1); // remove leading /

  // 1. Check CloudFront cache (via custom header or KV)
  // 2. If miss, call DynamoDB (via VPC endpoint or public endpoint)
  // 3. Return 301 redirect

  const item = await dynamodb.getItem({
    TableName: 'short-urls',
    Key: { pk: { S: shortCode } },
    ConsistentRead: false, // eventual consistency is fine for redirects
  }).promise();

  if (!item.Item) {
    return { status: '404', body: 'URL not found' };
  }

  return {
    status: '301',
    headers: {
      location: [{ key: 'Location', value: item.Item.longUrl.S }],
      'cache-control': [{ key: 'Cache-Control', value: 'max-age=86400' }],
    },
  };
};

Caching Strategy

We use a read-through cache with ElastiCache Redis:

Java

@Service
public class RedirectService {

    @Autowired
    private RedisTemplate<String, String> redis;

    @Autowired
    private DynamoDbClient dynamoDb;

    public String resolve(String shortCode) {
        // 1. Check Redis
        String cached = redis.opsForValue().get("url:" + shortCode);
        if (cached != null) return cached;

        // 2. DynamoDB fallback
        String longUrl = fetchFromDynamo(shortCode);
        if (longUrl == null) throw new NotFoundException();

        // 3. Cache for 1 hour
        redis.opsForValue().set("url:" + shortCode, longUrl,
                Duration.ofHours(1));

        return longUrl;
    }
}

Cache eviction: LRU policy on Redis. The 1GB cache holds ~2M URLs. Given our Pareto distribution (20% URLs = 80% traffic), this covers the hot tier comfortably.

Analytics

We use an async write pattern to avoid blocking the redirect:

Redirect returns immediately
A Kinesis Data Stream event is published with {shortCode, timestamp, userAgent, ip}
Lambda consumer processes the stream and writes to DynamoDB with ADD 1 atomic increment
Aggregated stats are pre-computed and cached in Redis

Rate Limiting

Java

// Sliding window rate limiter using Redis ZADD
public boolean isAllowed(String userId, int limit, int windowSecs) {
    long now = System.currentTimeMillis();
    long windowStart = now - (windowSecs * 1000L);
    String key = "rl:" + userId;

    // Remove expired entries
    redis.opsForZSet().removeRangeByScore(key, 0, windowStart);

    // Count current window
    Long count = redis.opsForZSet().count(key, windowStart, now);
    if (count != null && count >= limit) return false;

    // Add current request
    redis.opsForZSet().add(key, String.valueOf(now), now);
    redis.expire(key, windowSecs, TimeUnit.SECONDS);
    return true;
}

Handling Failures

DynamoDB Throttling

Use exponential backoff with jitter
Enable auto-scaling on both read and write capacity
Consider DAX (DynamoDB Accelerator) for microsecond reads if needed

Redis Failure

Fall back to DynamoDB directly (slower, but correct)
Redis Sentinel or ElastiCache Multi-AZ for HA

Cascading Failure Prevention

Circuit breaker on DynamoDB calls
Timeouts: 100ms on cache reads, 300ms on DynamoDB reads
Fallback: return 503 if both fail (better than hanging forever)

AWS Cost Estimate (Monthly, India traffic)

Service	Usage	Cost
API Gateway	25M requests	₹1,800
Lambda@Edge	250M invocations	₹3,600
DynamoDB	10GB + 25M reads	₹4,500
ElastiCache (r6g.large)	1 node	₹7,200
CloudFront	5TB transfer	₹3,000
Total		~₹20,100/month

Summary

Decision	Choice	Reason
Short code generation	Counter + Base62	No collisions, predictable
Primary database	DynamoDB	Key-value pattern, auto-scale
Cache	ElastiCache Redis	Sub-millisecond hot redirects
Redirect execution	Lambda@Edge	`<10ms` global latency
Analytics	Kinesis + async writes	Non-blocking, scalable

This architecture handles 10K+ redirects/second with a p99 latency under 50ms, costs roughly ₹20K/month, and scales horizontally by adding more ECS tasks or Lambda concurrency.

If you want to discuss the architecture for your specific use case, book a consultation.

Designing a URL Shortener on AWS: From Zero to Production

Requirements

Functional

Non-Functional

Capacity Estimation

URL Shortening Strategy

Option 1: MD5/SHA256 Hash

Option 2: Counter-Based (Our Choice)

Architecture on AWS

Why DynamoDB?

Redirect Service with Lambda@Edge

Caching Strategy

Analytics

Rate Limiting

Handling Failures

DynamoDB Throttling

Redis Failure

Cascading Failure Prevention

AWS Cost Estimate (Monthly, India traffic)

Summary

Ravi Kant Shukla

Enjoyed this post?

Comments (0)

Leave a comment

Related Posts

Designing for High Availability & Disaster Recovery

Message Queues & Async Processing: Kafka, RabbitMQ, and Event Streaming

Distributed Tracing & Observability: Finding the Slow Request Across Ten Services