polishchuk 0 1.1K 08.08.2024 en

Unique ID Generation Cheat Sheet

Categories: 📋 Cheat Sheets ❓ Common Interview Questions 💻 Programming 👷‍♀️ Software Architecture

Unique identifiers are crucial for ensuring data integrity and uniqueness in applications. They help track, reference, and ensure consistency across various systems and services. This article contains a Cheat Sheet and descriptions of the different types of unique identifier generators, their methods, and tradeoff analysis to help you choose the best option for your application.

Unique ID Generation Cheat Sheet by bool.dev — You can find a high-resolution PDF here.

0. Database Auto-Increment

Auto-increment fields in databases automatically generate unique identifiers for new records.

Example ID: 1, 2, 3 (incrementing sequentially)

Pros:

Simple and ensures uniqueness within a single database
Easy to implement in single-node applications
Efficient for small-scale applications
Sequentially ordered
Efficient storage of a numeric value

Cons:

Not suitable for distributed systems
Scalability issues in distributed environments
Performance can degrade with large datasets
The limited range is based on the integer type (INT, BIGINT, etc.).

Database ID Implementation

CREATE TABLE Users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);

1. Twitter Snowflake

A 64-bit identifier that includes a timestamp, machine ID, and sequence number, ensuring uniqueness across distributed systems.

Example ID: 13572484591234567

Pros:

Guaranteed unique within the distributed system
Highly scalable, can generate millions of IDs per second
Time-ordered, useful for chronological sorting
Efficient storage as a 64-bit integer

Cons:

Requires coordination between nodes to avoid collisions
Complexity in managing and coordinating worker IDs
Slightly slower than simple numeric increments due to bitwise operations

Twitter Snowflake Implementation

2. UUID V4 (Universally Unique Identifier)

A 128-bit identifier that is globally unique and can be generated independently by different systems without coordination.

Example ID: 550e8400-e29b-41d4-a716-446655440000

Pros:

Globally unique without the need for a central authority
Can be generated independently on multiple systems without coordination
Fast generation, no need for network communication

Cons:

Larger storage requirement (128-bit)
No inherent orderability, random distribution
Inefficient for use in databases with large datasets

UUID Generation

Mostly supported by default in different programming languages

Python

import uuid

# Generate a UUID
uuid_generated = uuid.uuid4()

print(uuid_generated)

Java

import java.util.UUID;

public class Main {
    public static void main(String[] args) {
        // Generate a UUID
        UUID uuid = UUID.randomUUID();
        
        System.out.println(uuid.toString());
    }
}

Node.js

const { v4: uuidv4 } = require('uuid');

// Generate a UUID
const uuid = uuidv4();

console.log(uuid);

JavaScript

function uuidv4() {
  return "10000000-1000-4000-8000-100000000000".replace(/[018]/g, c =>
    (+c ^ crypto.getRandomValues(new Uint8Array(1))[0] &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; 15 >> +c / 4).toString(16)
  );
}

console.log(uuidv4());

C#

using System;

class Program
{
    static void Main()
    {
        // Generate a UUID
        Guid uuid = Guid.NewGuid();
        
        Console.WriteLine(uuid.ToString());
    }
}

PHP

function GUID()
{
    if (function_exists('com_create_guid') === true)
    {
        return trim(com_create_guid(), '{}');
    }

    return sprintf('%04X%04X-%04X-%04X-%04X-%04X%04X%04X', mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(16384, 20479), mt_rand(32768, 49151), mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(0, 65535));
}

Ruby

require 'securerandom'

# Generate a UUID
uuid = SecureRandom.uuid

puts uuid

Go

package main

import (
    "fmt"
    "github.com/google/uuid"
)

func main() {
    // Generate a UUID
    uuid := uuid.New()

    fmt.Println(uuid.String())
}

Bash

#!/bin/bash

# Generate a UUID
uuid=$(uuidgen)

echo $uuid

3. UUID v7

UUIDv7 is a new variant of UUID that incorporates a timestamp for sortable and unique identifiers, combining elements of UUID and ULID.

Example ID: 01890c8e-bc4d-7b3f-915e-0d4d4e310e67

Pros:

Combines global uniqueness with timestamp-based sorting.
Can be generated independently on multiple systems.
Maintains compatibility with existing UUID systems and libraries.
Useful for chronological sorting and time-based queries.

Cons:

Larger storage requirement (128-bit).
Slightly more complex generation algorithm than traditional UUID.
Performance impact due to timestamp extraction.

4. ULID (Universally Unique Lexicographically Sortable Identifier)

A 128-bit, lexicographically sortable identifier combines timestamps and randomness to ensure uniqueness and orderability.

Example ID: 01ARZ3NDEKTSV4RRFFQ69G5FAV

Pros:

Globally unique, combining randomness and timestamp
Can be generated independently on multiple systems
Is compatible with UUID/GUID's
1.21e+24 unique ULIDs per millisecond (1,208,925,819,614,629,174,706,176 to be exact)
Lexicographically sortable
Uses Crockford's base32 for better efficiency and readability (5 bits per character)
Case insensitive
No special characters (URL safe)
Monotonic sort order (correctly detects and handles the same millisecond)

Cons:

A slightly more complex generation algorithm
Larger than Snowflake and others

ULID Implementation

5. KSUID (K-Sortable Unique Identifier)

A 160-bit identifier that is also sortable by time and includes a timestamp, random payload, and checksum.

Example ID: 0ujsszwN8NRY24YaXiTIEEIo7K0

Pros:

Combines timestamp and randomness, globally unique
Can be generated independently without coordination
Fast generation and verification, suitable for high-throughput systems
K-sortable, suitable for time-based sorting
Structured format aids in unique and ordered ID generation

Cons:

Larger storage requirement (160-bit)
Complexity in generation algorithm
Additional overhead due to checksum verification

KSUID Implementation

6. MongoDB Object ID

A 96-bit identifier that includes a timestamp, machine ID, process ID, and a counter, ensuring uniqueness and a rough creation order.

Example ID: 507f1f77bcf86cd799439011

Pros:

Combines timestamp, machine ID, process ID, and counter for uniqueness
Can be generated independently on multiple nodes
Fast generation, used natively in MongoDB
Roughly ordered based on creation time
96-bit size balances uniqueness and storage efficiency

Cons:

Not guaranteed to be globally unique in all scenarios
Potential for collision if machine and process IDs are not managed correctly
Less efficient than simple integer-based IDs
Unique to MongoDB: ObjectId is only compatible with MongoDB and might not work with other database systems. The ObjectId format may need to be converted or mapped if you need to integrate with a different database or move to a new one.

7. CUID (Collision-resistant Unique Identifier)

CUIDs are designed to be highly unique, focusing on being readable and less prone to collisions even in high-concurrency environments.

A Collision-resistant Unique Identifier (CUID) with 25 characters in base-36 encoding typically has a size of around 129 bits. This provides a substantial level of uniqueness while keeping the identifier relatively compact.

Example ID: cjld2cyuq0006s1rxy8123456

Pros:

Highly unique with collision resistance even in high concurrency
Suitable for distributed systems with high-traffic
Fast generation, designed for high-concurrency environments
Efficient storage, compact format

Cons:

Slightly larger than simple numeric IDs
More complex than simple numeric or auto-increment IDs
Not naturally ordered, designed primarily for uniqueness

CUID implementation

8. NanoID

NanoID is a tiny, secure, URL-friendly unique string ID generator that is designed to be more flexible and performant than UUID.

Example ID: V1StGXR8_Z5jdHi6B

Pros:

Secure and highly unique with customizable size and alphabet
Suitable for distributed systems, highly scalable
Fast generation with high security
Compact and efficient due to customizable length and alphabet

Cons:

Slightly more complex generation than UUID
Customizability might lead to misuse if not properly managed
No inherent orderability

NanoID Implementation

9. Sonyflake

Sonyflake is a distributed unique ID generator inspired by Twitter Snowflake, optimized for 64-bit IDs, and designed to be more efficient for generating unique IDs in a single data center.

Example ID: 1132088477364927953

Pros:

Highly unique with a low risk of collisions in a single data center
Optimized for single data center deployment, highly scalable
Fast generation with efficient use of 64-bit storage
Time-ordered, useful for chronological sorting
Efficient 64-bit size, similar to Twitter Snowflake

Cons:

Requires careful configuration to avoid collisions
Less suitable for multi-data center environments
Limited to the single data center, it requires careful management

Sonyflake Implementation

10. FlakeID

FlakeID is a decentralized unique ID generator that creates 128-bit IDs based on timestamps, machine IDs, and random numbers.

Flake IDs have 128-bit size:

64-bit timestamp - milliseconds since the epoch (Jan 1, 1970)
48-bit worker ID - MAC address from a configurable device
16-bit sequence # - usually 0, incremented when more than one ID is requested in the same millisecond and reset to 0 when the clock ticks forward

Example ID: 4zqG3B2TnMs57S1PvQ

Pros:

Combines timestamp, machine ID, and randomness for high uniqueness
Suitable for distributed systems, highly scalable
Fast generation, ideal for high-throughput systems
Roughly ordered based on timestamp
Larger 128-bit size balances uniqueness and orderability

Cons:

Requires careful management of machine IDs and random components
Larger storage size compared to simpler numeric IDs
Not precisely ordered due to random component

FlakeID Implementation

11. Base62

Base62 encoding is a method of encoding IDs that combines uppercase and lowercase letters and digits, making it URL-friendly and compact. The size of Base62 of length 22 characters =22×5.95≈130.9-bit

Example ID: 1B2M2Y8AsgTpgAmY7PhCfg

Pros:

Unique and URL-friendly avoid special characters
Suitable for web applications needing URL-friendly IDs
Fast generation with compact encoding
Compact storage due to Base62 encoding

Cons:

Not inherently collision-resistant in high-concurrency environments
Requires careful management to ensure uniqueness
Complexity in encoding and decoding compared to numeric IDs
No inherent orderability

Base62 implementation

Summary

When selecting a unique identifier format for your application, consider the specific requirements, including uniqueness, scalability, performance, orderability, and storage efficiency. Each type of identifier has its tradeoffs, and the best choice will depend on your application's context.

Database Auto-Increment: Simple and efficient for single-node applications, but not suitable for distributed systems.
Twitter Snowflake: Best for high-throughput, time-ordered IDs in distributed systems.
UUID: Ideal for global uniqueness without central coordination but with larger storage needs.
ULID: Combines timestamp-based ordering with global uniqueness, suitable for sortable IDs.
KSUID: Offers sortable IDs with additional robustness from a checksum, though with larger storage requirements.
MongoDB ObjectID: A balanced approach for systems needing roughly ordered IDs with reasonable storage efficiency.
CUID: Best for high-concurrency environments needing collision resistance.
NanoID: Ideal for secure, URL-friendly unique IDs with customizable length.
Sonyflake: Optimized for single data center deployment with efficient 64-bit IDs.
FlakeID: Suitable for distributed systems needing 128-bit unique IDs.
Base62: Compact, URL-friendly IDs for web applications needing readable IDs.

By understanding these tradeoffs, you can make an informed decision that aligns with your application's needs for uniqueness, performance, and simplicity.

References and Further Reading

Comments:

Please log in to be able add comments.