system-design

Designing a URL Shortener on AWS: From Zero to Production

A complete walkthrough of designing a production-ready URL shortener on AWS — covering hashing strategies, database selection, caching, and scaling to billions of redirects.

March 10, 202412 min read
AWSSystem DesignDynamoDBElastiCacheAPI Gateway

Building a URL shortener is one of the most popular system design interview questions — and for good reason. It touches almost every fundamental: hashing, database choice, caching, rate limiting, analytics, and horizontal scaling.

In this post, we'll design a URL shortener that can handle 10,000 requests per second and store 10 billion URLs, deployed entirely on AWS.

Requirements

Functional

  • Given a long URL, return a short URL (7-character code)
  • Given a short URL, redirect to the original long URL
  • Support custom aliases (e.g. go.co/launch)
  • Track click analytics (count, location, device)
  • URLs expire after a configurable TTL

Non-Functional

  • Read-heavy: 100:1 read/write ratio
  • 99.99% availability for redirects
  • <50ms p99 redirect latency
  • Globally distributed (optimised for India + US)

Capacity Estimation

Let's work through the numbers:

Write rate:      100 URLs/sec → 8.64M URLs/day
Read rate:       10,000 redirects/sec
Storage per URL: ~500 bytes (URL + metadata)
Storage 5 years: 8.64M × 365 × 5 × 500B ≈ 7.9TB
Cache memory:    80% traffic hits 20% URLs (Pareto)
                 10K RPS × 0.2 × 500B ≈ 1GB hot data

URL Shortening Strategy

The most critical decision is how we generate the 7-character short code.

Option 1: MD5/SHA256 Hash

Hash the long URL, take the first 7 characters:

import hashlib
import base64

def shorten(long_url: str) -> str:
    digest = hashlib.md5(long_url.encode()).digest()
    encoded = base64.urlsafe_b64encode(digest).decode()
    return encoded[:7]

Problem: Hash collisions become likely at scale. With a 7-char Base62 code (62^7 = 3.5 trillion combinations), collisions appear around the birthday paradox boundary (~56K URLs).

Option 2: Counter-Based (Our Choice)

Use a global counter and convert to Base62:

public class Base62Encoder {
    private static final String CHARS =
        "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

    public static String encode(long id) {
        StringBuilder sb = new StringBuilder();
        while (id > 0) {
            sb.insert(0, CHARS.charAt((int)(id % 62)));
            id /= 62;
        }
        // Pad to 7 characters
        while (sb.length() < 7) sb.insert(0, 'a');
        return sb.toString();
    }
}

For distributed counter generation without a single point of failure, we use range-based allocation: each application server is pre-allocated a range of IDs (e.g., 1–10M, 10M–20M) from a Redis counter. When a server exhausts its range, it claims the next.


Architecture on AWS

┌─────────────────────────────────────────────────────────┐
│                    CloudFront (CDN)                      │
│              Asia Pacific + US edge nodes                │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│                  API Gateway (REST)                      │
│         Rate limiting: 10K req/sec per API key           │
└──────────────┬─────────────────────┬────────────────────┘
               │                     │
    ┌──────────▼──────┐   ┌─────────▼─────────┐
    │  Write Service  │   │  Redirect Service  │
    │  (ECS Fargate)  │   │  (Lambda@Edge)     │
    └──────────┬──────┘   └─────────┬──────────┘
               │                    │
    ┌──────────▼──────┐   ┌─────────▼─────────┐
    │   DynamoDB      │   │  ElastiCache       │
    │  (Primary store)│◄──│  Redis (hot URLs)  │
    └─────────────────┘   └────────────────────┘

Why DynamoDB?

  • Key-value access pattern: We always look up by short code (partition key)
  • Auto-scaling: Handles 10K+ reads/sec without manual intervention
  • TTL support: Built-in expiration via the ttl attribute
  • Global Tables: Multi-region replication for <50ms latency worldwide
// DynamoDB item schema
interface ShortUrl {
  pk: string;           // short code (partition key)
  longUrl: string;      // original URL
  userId: string;       // owner
  clickCount: number;   // atomic counter
  ttl: number;          // Unix timestamp for expiry
  createdAt: string;    // ISO 8601
  customAlias: boolean; // was it a custom alias?
}

Redirect Service with Lambda@Edge

For the hot redirect path, we use Lambda@Edge — it runs at CloudFront edge nodes, meaning the redirect logic executes within ~10ms of the user:

// Lambda@Edge Viewer Request handler
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const shortCode = request.uri.slice(1); // remove leading /

  // 1. Check CloudFront cache (via custom header or KV)
  // 2. If miss, call DynamoDB (via VPC endpoint or public endpoint)
  // 3. Return 301 redirect

  const item = await dynamodb.getItem({
    TableName: 'short-urls',
    Key: { pk: { S: shortCode } },
    ConsistentRead: false, // eventual consistency is fine for redirects
  }).promise();

  if (!item.Item) {
    return { status: '404', body: 'URL not found' };
  }

  return {
    status: '301',
    headers: {
      location: [{ key: 'Location', value: item.Item.longUrl.S }],
      'cache-control': [{ key: 'Cache-Control', value: 'max-age=86400' }],
    },
  };
};

Caching Strategy

We use a read-through cache with ElastiCache Redis:

@Service
public class RedirectService {

    @Autowired
    private RedisTemplate<String, String> redis;

    @Autowired
    private DynamoDbClient dynamoDb;

    public String resolve(String shortCode) {
        // 1. Check Redis
        String cached = redis.opsForValue().get("url:" + shortCode);
        if (cached != null) return cached;

        // 2. DynamoDB fallback
        String longUrl = fetchFromDynamo(shortCode);
        if (longUrl == null) throw new NotFoundException();

        // 3. Cache for 1 hour
        redis.opsForValue().set("url:" + shortCode, longUrl,
                Duration.ofHours(1));

        return longUrl;
    }
}

Cache eviction: LRU policy on Redis. The 1GB cache holds ~2M URLs. Given our Pareto distribution (20% URLs = 80% traffic), this covers the hot tier comfortably.


Analytics

We use an async write pattern to avoid blocking the redirect:

  1. Redirect returns immediately
  2. A Kinesis Data Stream event is published with {shortCode, timestamp, userAgent, ip}
  3. Lambda consumer processes the stream and writes to DynamoDB with ADD 1 atomic increment
  4. Aggregated stats are pre-computed and cached in Redis

Rate Limiting

// Sliding window rate limiter using Redis ZADD
public boolean isAllowed(String userId, int limit, int windowSecs) {
    long now = System.currentTimeMillis();
    long windowStart = now - (windowSecs * 1000L);
    String key = "rl:" + userId;

    // Remove expired entries
    redis.opsForZSet().removeRangeByScore(key, 0, windowStart);

    // Count current window
    Long count = redis.opsForZSet().count(key, windowStart, now);
    if (count != null && count >= limit) return false;

    // Add current request
    redis.opsForZSet().add(key, String.valueOf(now), now);
    redis.expire(key, windowSecs, TimeUnit.SECONDS);
    return true;
}

Handling Failures

DynamoDB Throttling

  • Use exponential backoff with jitter
  • Enable auto-scaling on both read and write capacity
  • Consider DAX (DynamoDB Accelerator) for microsecond reads if needed

Redis Failure

  • Fall back to DynamoDB directly (slower, but correct)
  • Redis Sentinel or ElastiCache Multi-AZ for HA

Cascading Failure Prevention

  • Circuit breaker on DynamoDB calls
  • Timeouts: 100ms on cache reads, 300ms on DynamoDB reads
  • Fallback: return 503 if both fail (better than hanging forever)

AWS Cost Estimate (Monthly, India traffic)

ServiceUsageCost
API Gateway25M requests₹1,800
Lambda@Edge250M invocations₹3,600
DynamoDB10GB + 25M reads₹4,500
ElastiCache (r6g.large)1 node₹7,200
CloudFront5TB transfer₹3,000
Total~₹20,100/month

Summary

DecisionChoiceReason
Short code generationCounter + Base62No collisions, predictable
Primary databaseDynamoDBKey-value pattern, auto-scale
CacheElastiCache RedisSub-millisecond hot redirects
Redirect executionLambda@Edge<10ms global latency
AnalyticsKinesis + async writesNon-blocking, scalable

This architecture handles 10K+ redirects/second with a p99 latency under 50ms, costs roughly ₹20K/month, and scales horizontally by adding more ECS tasks or Lambda concurrency.

If you want to discuss the architecture for your specific use case, book a consultation.

R

Ravi Kant Shukla

Backend architect helping developers and startups build production-grade systems on AWS. 8+ years of experience in system design, microservices, and AI/ML deployment.

Enjoyed this post?

Get more system design and AWS insights delivered weekly. No spam.