Hash Tables

Understanding hash tables and their O(1) average time complexity. Learn the fundamentals of hash-based data structures.

Hash tables provide O(1) average time for insertion, deletion, and lookup. They're one of the most practical data structures, powering dictionaries, sets, caches, and databases.

What is a Hash Table?

A hash table is an array-based data structure that uses a hash function to map keys to indices. This mapping enables direct access to values without searching.

Key "apple" → hash function → index 2 → store at arr[2]

Array:
[0]    [1]    [2]      [3]    [4]
empty  empty  "apple"  empty  empty

Hash Function

A hash function converts a key into an array index:

fn simple_hash(key: &str, size: usize) -> usize {
    // Simple hash function for strings
    let hash_val: u32 = key.chars().map(|c| c as u32).sum();
    (hash_val as usize) % size
}

hash("cat") = (99 + 97 + 116) % 10 = 312 % 10 = 2
hash("dog") = (100 + 111 + 103) % 10 = 314 % 10 = 4
hash("act") = (97 + 99 + 116) % 10 = 312 % 10 = 2  ← Collision!

Good Hash Functions

A good hash function distributes keys uniformly across the array, minimizing collisions. Real implementations use sophisticated algorithms like MurmurHash, SipHash, or FNV.

Collisions

When two keys hash to the same index, we have a collision. There are two main strategies to handle collisions.

Strategy 1: Chaining

Store colliding elements in a linked list at each index:

Insert "cat", "dog", "act" with size 5:

[0] → null
[1] → null
[2] → "cat" → "act" → null    ← Collision handled by chaining
[3] → null
[4] → "dog" → null

use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};

struct HashTableChaining<K, V> {
    size: usize,
    table: Vec<Vec<(K, V)>>,
    count: usize,
}

impl<K: Hash + Eq, V> HashTableChaining<K, V> {
    fn new(size: usize) -> Self {
        HashTableChaining {
            size,
            table: (0..size).map(|_| Vec::new()).collect(),
            count: 0,
        }
    }

    fn hash(&self, key: &K) -> usize {
        let mut hasher = DefaultHasher::new();
        key.hash(&mut hasher);
        (hasher.finish() as usize) % self.size
    }

    fn put(&mut self, key: K, value: V) {
        let index = self.hash(&key);

        // Check if key exists
        for (k, v) in &mut self.table[index] {
            if k == &key {
                *v = value;
                return;
            }
        }

        self.table[index].push((key, value));
        self.count += 1;
    }

    fn get(&self, key: &K) -> Option<&V> {
        let index = self.hash(key);

        for (k, v) in &self.table[index] {
            if k == key {
                return Some(v);
            }
        }

        None
    }

    fn remove(&mut self, key: &K) -> bool {
        let index = self.hash(key);

        if let Some(pos) = self.table[index].iter().position(|(k, _)| k == key) {
            self.table[index].remove(pos);
            self.count -= 1;
            return true;
        }

        false
    }
}

Strategy 2: Open Addressing

Find another empty slot when collision occurs:

Linear Probing: Try next index

Insert "cat" → index 2, empty, insert
Insert "act" → index 2, occupied! Try 3, empty, insert

[0]    [1]    [2]     [3]     [4]
empty  empty  "cat"   "act"   empty

use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};

struct HashTableOpenAddressing<K, V> {
    size: usize,
    keys: Vec<Option<K>>,
    values: Vec<Option<V>>,
    count: usize,
}

impl<K: Hash + Eq + Clone, V: Clone> HashTableOpenAddressing<K, V> {
    fn new(size: usize) -> Self {
        HashTableOpenAddressing {
            size,
            keys: (0..size).map(|_| None).collect(),
            values: (0..size).map(|_| None).collect(),
            count: 0,
        }
    }

    fn hash(&self, key: &K) -> usize {
        let mut hasher = DefaultHasher::new();
        key.hash(&mut hasher);
        (hasher.finish() as usize) % self.size
    }

    fn probe(&self, index: usize) -> usize {
        // Linear probing: next index
        (index + 1) % self.size
    }

    fn put(&mut self, key: K, value: V) {
        if (self.count as f64) >= (self.size as f64 * 0.7) {
            // Would need resize implementation
        }

        let mut index = self.hash(&key);
        let original = index;

        while self.keys[index].is_some() {
            if self.keys[index].as_ref() == Some(&key) {
                self.values[index] = Some(value);
                return;
            }

            index = self.probe(index);
            if index == original {
                panic!("Hash table full");
            }
        }

        self.keys[index] = Some(key);
        self.values[index] = Some(value);
        self.count += 1;
    }

    fn get(&self, key: &K) -> Option<&V> {
        let mut index = self.hash(key);
        let original = index;

        while self.keys[index].is_some() {
            if self.keys[index].as_ref() == Some(key) {
                return self.values[index].as_ref();
            }

            index = self.probe(index);
            if index == original {
                break;
            }
        }

        None
    }
}

Load Factor

The load factor (n/size) affects performance. High load factor increases collisions. Resize when load factor exceeds 0.7-0.8 for good performance.

Time Complexity

| Operation | Average | Worst | |-----------|---------|-------| | Insert | O(1) | O(n) | | Lookup | O(1) | O(n) | | Delete | O(1) | O(n) |

Worst case happens when all keys collide (poor hash function or adversarial input).

Resizing

When the load factor gets too high, double the size and rehash all elements:

impl<K: Hash + Eq, V> HashTableChaining<K, V> {
    fn resize(&mut self) {
        let old_table = std::mem::replace(
            &mut self.table,
            (0..self.size * 2).map(|_| Vec::new()).collect()
        );
        self.size *= 2;
        self.count = 0;

        for bucket in old_table {
            for (key, value) in bucket {
                self.put(key, value);
            }
        }
    }
}

Common Applications

1. Dictionary/Map

use std::collections::HashMap;

// Rust HashMap is a hash table
let mut user = HashMap::new();
user.insert("name", "Alice");
user.insert("age", "30");
user.insert("email", "[email protected]");

println!("{}", user["name"]);  // O(1) lookup
user.insert("city", "NYC");     // O(1) insert

2. Set (Unique Elements)

use std::collections::HashSet;

// Rust HashSet uses hash table
let mut seen = HashSet::new();
seen.insert(5);
seen.insert(3);
seen.insert(5);  // Duplicate, ignored
println!("{}", seen.len());  // 2
println!("{}", seen.contains(&3));  // O(1) lookup - true

3. Counting Frequency

use std::collections::HashMap;

fn count_frequency(arr: &[i32]) -> HashMap<i32, i32> {
    let mut freq = HashMap::new();
    for &item in arr {
        *freq.entry(item).or_insert(0) += 1;
    }
    freq
}

fn main() {
    let arr = vec![1, 2, 2, 3, 3, 3];
    println!("{:?}", count_frequency(&arr));
    // {1: 1, 2: 2, 3: 3}
}

4. Finding Duplicates

use std::collections::HashSet;

fn has_duplicates(arr: &[i32]) -> bool {
    let mut seen = HashSet::new();
    for &item in arr {
        if seen.contains(&item) {
            return true;
        }
        seen.insert(item);
    }
    false
}

5. Two Sum Problem

use std::collections::HashMap;

fn two_sum(nums: &[i32], target: i32) -> Vec<usize> {
    // Find indices of two numbers that sum to target
    let mut seen = HashMap::new();  // value → index

    for (i, &num) in nums.iter().enumerate() {
        let complement = target - num;
        if let Some(&j) = seen.get(&complement) {
            return vec![j, i];
        }
        seen.insert(num, i);
    }

    vec![]
}

fn main() {
    println!("{:?}", two_sum(&[2, 7, 11, 15], 9));  // [0, 1]
}

6. Grouping Anagrams

use std::collections::HashMap;

fn group_anagrams(words: Vec<String>) -> Vec<Vec<String>> {
    let mut groups: HashMap<String, Vec<String>> = HashMap::new();

    for word in words {
        let mut chars: Vec<char> = word.chars().collect();
        chars.sort();
        let key: String = chars.into_iter().collect();

        groups.entry(key).or_insert_with(Vec::new).push(word);
    }

    groups.into_values().collect()
}

fn main() {
    let words = vec![
        "eat".to_string(), "tea".to_string(), "tan".to_string(),
        "ate".to_string(), "nat".to_string(), "bat".to_string()
    ];
    println!("{:?}", group_anagrams(words));
    // [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
}

Hash Tables vs Other Structures

| Structure | Insert | Lookup | Ordered? | |-----------|--------|--------|----------| | Hash Table | O(1)* | O(1)* | No | | BST | O(log n) | O(log n) | Yes | | Sorted Array | O(n) | O(log n) | Yes | | Linked List | O(1) | O(n) | No |

*Average case

Language Implementations

| Language | HashMap | HashSet | |----------|---------|---------| | Rust | HashMap | HashSet | | Python | dict | set | | Java | HashMap | HashSet | | C++ | unordered_map | unordered_set | | Go | map | Use map[key]struct{} | | JavaScript | Map, Object | Set |

Key Takeaways

Hash tables provide O(1) average-case operations
Hash function maps keys to array indices
Collisions handled by chaining or open addressing
Load factor affects performance; resize when needed
Foundation for dictionaries, sets, and caches
Unordered—use BST-based structures when order matters

Hash tables are essential for efficient programming. Most "count frequency," "find duplicates," and "lookup" problems are solved elegantly with hash tables.

Heapify

Hash Functions & Collisions