1

.

Tiny Url

Easy

Given a url, design a web API that generates a shorter and unique alias of it

Functional Requirements

  • Given a url, generate a shorter and unique alias of it
  • When user accesses short link, redirect them to the original url
  • Users can pick custom short links

Non-Functional Requirements

  • 99.99% Availability
  • 15ms latency on redirection
  • Non-custom short links not guessable by default
  • Links expire at 5 years by default
  • Should perform globally
  • All urls are public

Usage Expectations

  • 100 Reads per Write
  • 500,000,000 urls created every month

Assumptions

  • Each url is 500 bytes
  • We store 20% of urls in a cache
  • Cache TTL 24 hrs
Companies
Plus sign indicating more info is available
Amazon
Twitter
Reddit

Grading Criteria

Before looking at the solution, do your best to document the follow:

  • Diagrams explaining the system architecture, the read flow, and the write flow
  • Database tables and their field
  • API endpoints, including inputs
  • Analytics to collect
  • Data partitioning & Replication Strategy
  • Capacity Estimations
  • Caching strategy
  • Load balancing strategy
  • Security considerations

After looking at the solution, compare your answer with the solution and mark what you missed

No items found.

Flow Explanations

Write flow

Success Case

  • Client submits a url to our name servers
  • If the url is not a custom url, generate a hash based on the original url, and append the user id to ensure same url being submit per user is unique
  • The encoded url is then stored in the database
  • The url is then returned

Read flow

Case 1: URL in cache

  • A client visits a short url
  • We check to see if that url is in the cache
  • If it is we redirect the user to the original url

Case 2: URL not in cache but in DB

  • The url is not found in the cache
  • We go to the database and find the url
  • We then populate the cache

Case 3: URL not in db

  • The url is not in the DB either
  • An error is returned to the client

Offline Key Generation Service

We precompute keys during off hours and then use those to quickly return a unique key

Hash can be calculated by appending the user id to the end of the url being processed

Analytics

User Behavior

  • url_accessed - an end user read a short link and was redirected url
  • url_created - a user created a new short link

System Health

  • cache_miss - fired when there is a cache miss
  • key_collision - fired when a hash key collision takes place

APIs

Create Url

Parameters - api_dev_key - original_url - custom_alias=None - user_name=None - expire_date=None

Delete Url

Parameters -  api_dev_key url_key

Database

Type of database: NoSQL

Table 1: urls

Primary Key: hash varchar

  • original_url: varchar
  • creation_date: datetime
  • expiration_date: datetime
  • user_id: int

Table 2: users

Primary Key: user_id (int)

  • name: varchar
  • email: varchar
  • creation_date: datetime
  • last_login: datetime

Data Partitioning and Replication

Hash-based partitioning to ensure even distribution across all shards with Consistent Hashing and Virtual Nodes

Capacity Estimations

Total space required to store urls: 15TB

  • Memory for cache: 170GB
  • New URLs per second: 200/s 
  • URL redirections: 20K/s
  • Incoming Data per second: 100KB/s
  • Data Out: ~10 MB/s

Security and permissions

API keys are issued to url creators to prevent too many urls from being created

Cache

Cache any frequently accessed URLs

Size: 20% of daily traffic, can adjust depending on cache_miss log

Caching eviction strategy: Least Recently Used

Caching updating strategy: Add url to cache if there is a cache miss

Load Balancing

Schema: Round-Robin should be enough here

Places to inject Load Balancers

  • Between Clients and Application servers
  • Between Application Servers and database servers
  • Between Application Servers and Cache servers
Hide SolutionShow Solution