Hash Tables
Understanding hash tables and their O(1) average time complexity. Learn the fundamentals of hash-based data structures.
Hash tables provide O(1) average time for insertion, deletion, and lookup. They're one of the most practical data structures, powering dictionaries, sets, caches, and databases.
What is a Hash Table?
A hash table is an array-based data structure that uses a hash function to map keys to indices. This mapping enables direct access to values without searching.
Key "apple" → hash function → index 2 → store at arr[2]
Array:
[0] [1] [2] [3] [4]
empty empty "apple" empty empty
Hash Function
A hash function converts a key into an array index:
fn simple_hash(key: &str, size: usize) -> usize {
// Simple hash function for strings
let hash_val: u32 = key.chars().map(|c| c as u32).sum();
(hash_val as usize) % size
}
hash("cat") = (99 + 97 + 116) % 10 = 312 % 10 = 2
hash("dog") = (100 + 111 + 103) % 10 = 314 % 10 = 4
hash("act") = (97 + 99 + 116) % 10 = 312 % 10 = 2 ← Collision!
Good Hash Functions
A good hash function distributes keys uniformly across the array, minimizing collisions. Real implementations use sophisticated algorithms like MurmurHash, SipHash, or FNV.
Collisions
When two keys hash to the same index, we have a collision. There are two main strategies to handle collisions.
Strategy 1: Chaining
Store colliding elements in a linked list at each index:
Insert "cat", "dog", "act" with size 5:
[0] → null
[1] → null
[2] → "cat" → "act" → null ← Collision handled by chaining
[3] → null
[4] → "dog" → null
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
struct HashTableChaining<K, V> {
size: usize,
table: Vec<Vec<(K, V)>>,
count: usize,
}
impl<K: Hash + Eq, V> HashTableChaining<K, V> {
fn new(size: usize) -> Self {
HashTableChaining {
size,
table: (0..size).map(|_| Vec::new()).collect(),
count: 0,
}
}
fn hash(&self, key: &K) -> usize {
let mut hasher = DefaultHasher::new();
key.hash(&mut hasher);
(hasher.finish() as usize) % self.size
}
fn put(&mut self, key: K, value: V) {
let index = self.hash(&key);
// Check if key exists
for (k, v) in &mut self.table[index] {
if k == &key {
*v = value;
return;
}
}
self.table[index].push((key, value));
self.count += 1;
}
fn get(&self, key: &K) -> Option<&V> {
let index = self.hash(key);
for (k, v) in &self.table[index] {
if k == key {
return Some(v);
}
}
None
}
fn remove(&mut self, key: &K) -> bool {
let index = self.hash(key);
if let Some(pos) = self.table[index].iter().position(|(k, _)| k == key) {
self.table[index].remove(pos);
self.count -= 1;
return true;
}
false
}
}
Strategy 2: Open Addressing
Find another empty slot when collision occurs:
Linear Probing: Try next index
Insert "cat" → index 2, empty, insert
Insert "act" → index 2, occupied! Try 3, empty, insert
[0] [1] [2] [3] [4]
empty empty "cat" "act" empty
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
struct HashTableOpenAddressing<K, V> {
size: usize,
keys: Vec<Option<K>>,
values: Vec<Option<V>>,
count: usize,
}
impl<K: Hash + Eq + Clone, V: Clone> HashTableOpenAddressing<K, V> {
fn new(size: usize) -> Self {
HashTableOpenAddressing {
size,
keys: (0..size).map(|_| None).collect(),
values: (0..size).map(|_| None).collect(),
count: 0,
}
}
fn hash(&self, key: &K) -> usize {
let mut hasher = DefaultHasher::new();
key.hash(&mut hasher);
(hasher.finish() as usize) % self.size
}
fn probe(&self, index: usize) -> usize {
// Linear probing: next index
(index + 1) % self.size
}
fn put(&mut self, key: K, value: V) {
if (self.count as f64) >= (self.size as f64 * 0.7) {
// Would need resize implementation
}
let mut index = self.hash(&key);
let original = index;
while self.keys[index].is_some() {
if self.keys[index].as_ref() == Some(&key) {
self.values[index] = Some(value);
return;
}
index = self.probe(index);
if index == original {
panic!("Hash table full");
}
}
self.keys[index] = Some(key);
self.values[index] = Some(value);
self.count += 1;
}
fn get(&self, key: &K) -> Option<&V> {
let mut index = self.hash(key);
let original = index;
while self.keys[index].is_some() {
if self.keys[index].as_ref() == Some(key) {
return self.values[index].as_ref();
}
index = self.probe(index);
if index == original {
break;
}
}
None
}
}
Load Factor
The load factor (n/size) affects performance. High load factor increases collisions. Resize when load factor exceeds 0.7-0.8 for good performance.
Time Complexity
| Operation | Average | Worst | |-----------|---------|-------| | Insert | O(1) | O(n) | | Lookup | O(1) | O(n) | | Delete | O(1) | O(n) |
Worst case happens when all keys collide (poor hash function or adversarial input).
Resizing
When the load factor gets too high, double the size and rehash all elements:
impl<K: Hash + Eq, V> HashTableChaining<K, V> {
fn resize(&mut self) {
let old_table = std::mem::replace(
&mut self.table,
(0..self.size * 2).map(|_| Vec::new()).collect()
);
self.size *= 2;
self.count = 0;
for bucket in old_table {
for (key, value) in bucket {
self.put(key, value);
}
}
}
}
Common Applications
1. Dictionary/Map
use std::collections::HashMap;
// Rust HashMap is a hash table
let mut user = HashMap::new();
user.insert("name", "Alice");
user.insert("age", "30");
user.insert("email", "[email protected]");
println!("{}", user["name"]); // O(1) lookup
user.insert("city", "NYC"); // O(1) insert
2. Set (Unique Elements)
use std::collections::HashSet;
// Rust HashSet uses hash table
let mut seen = HashSet::new();
seen.insert(5);
seen.insert(3);
seen.insert(5); // Duplicate, ignored
println!("{}", seen.len()); // 2
println!("{}", seen.contains(&3)); // O(1) lookup - true
3. Counting Frequency
use std::collections::HashMap;
fn count_frequency(arr: &[i32]) -> HashMap<i32, i32> {
let mut freq = HashMap::new();
for &item in arr {
*freq.entry(item).or_insert(0) += 1;
}
freq
}
fn main() {
let arr = vec![1, 2, 2, 3, 3, 3];
println!("{:?}", count_frequency(&arr));
// {1: 1, 2: 2, 3: 3}
}
4. Finding Duplicates
use std::collections::HashSet;
fn has_duplicates(arr: &[i32]) -> bool {
let mut seen = HashSet::new();
for &item in arr {
if seen.contains(&item) {
return true;
}
seen.insert(item);
}
false
}
5. Two Sum Problem
use std::collections::HashMap;
fn two_sum(nums: &[i32], target: i32) -> Vec<usize> {
// Find indices of two numbers that sum to target
let mut seen = HashMap::new(); // value → index
for (i, &num) in nums.iter().enumerate() {
let complement = target - num;
if let Some(&j) = seen.get(&complement) {
return vec![j, i];
}
seen.insert(num, i);
}
vec![]
}
fn main() {
println!("{:?}", two_sum(&[2, 7, 11, 15], 9)); // [0, 1]
}
6. Grouping Anagrams
use std::collections::HashMap;
fn group_anagrams(words: Vec<String>) -> Vec<Vec<String>> {
let mut groups: HashMap<String, Vec<String>> = HashMap::new();
for word in words {
let mut chars: Vec<char> = word.chars().collect();
chars.sort();
let key: String = chars.into_iter().collect();
groups.entry(key).or_insert_with(Vec::new).push(word);
}
groups.into_values().collect()
}
fn main() {
let words = vec![
"eat".to_string(), "tea".to_string(), "tan".to_string(),
"ate".to_string(), "nat".to_string(), "bat".to_string()
];
println!("{:?}", group_anagrams(words));
// [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
}
Hash Tables vs Other Structures
| Structure | Insert | Lookup | Ordered? | |-----------|--------|--------|----------| | Hash Table | O(1)* | O(1)* | No | | BST | O(log n) | O(log n) | Yes | | Sorted Array | O(n) | O(log n) | Yes | | Linked List | O(1) | O(n) | No |
*Average case
Language Implementations
| Language | HashMap | HashSet |
|----------|---------|---------|
| Rust | HashMap | HashSet |
| Python | dict | set |
| Java | HashMap | HashSet |
| C++ | unordered_map | unordered_set |
| Go | map | Use map[key]struct{} |
| JavaScript | Map, Object | Set |
Key Takeaways
- Hash tables provide O(1) average-case operations
- Hash function maps keys to array indices
- Collisions handled by chaining or open addressing
- Load factor affects performance; resize when needed
- Foundation for dictionaries, sets, and caches
- Unordered—use BST-based structures when order matters
Hash tables are essential for efficient programming. Most "count frequency," "find duplicates," and "lookup" problems are solved elegantly with hash tables.
