Bucket Sort

Distribution sort that works in linear time for uniformly distributed data. Learn when bucket sort outperforms comparison sorts.

What if we could sort without comparing elements? Bucket sort achieves linear time O(n) by distributing elements into buckets based on their values, sorting each bucket, and concatenating the results.

How It Works

Create empty "buckets" (usually arrays)
Distribute elements into buckets based on some function of their value
Sort each bucket (using any sorting algorithm)
Concatenate all buckets in order

Sorting [0.42, 0.32, 0.23, 0.52, 0.25, 0.47, 0.51]:

Step 1: Create 5 buckets for ranges [0, 0.2), [0.2, 0.4), ...

Step 2: Distribute
Bucket 0 [0.0-0.2): []
Bucket 1 [0.2-0.4): [0.32, 0.23, 0.25]
Bucket 2 [0.4-0.6): [0.42, 0.52, 0.47, 0.51]
Bucket 3 [0.6-0.8): []
Bucket 4 [0.8-1.0): []

Step 3: Sort each bucket
Bucket 1: [0.23, 0.25, 0.32]
Bucket 2: [0.42, 0.47, 0.51, 0.52]

Step 4: Concatenate
[0.23, 0.25, 0.32, 0.42, 0.47, 0.51, 0.52]

Implementation

fn bucket_sort(arr: &mut Vec<f64>) -> Vec<f64> {
    if arr.is_empty() {
        return arr.clone();
    }

    let n = arr.len();

    // Find range
    let min_val = arr.iter().copied().fold(f64::INFINITY, f64::min);
    let max_val = arr.iter().copied().fold(f64::NEG_INFINITY, f64::max);

    // Create buckets
    let bucket_count = n;
    let mut buckets: Vec<Vec<f64>> = vec![Vec::new(); bucket_count];

    // Distribute elements into buckets
    let range_val = max_val - min_val;
    if range_val == 0.0 {  // All elements are the same
        return arr.clone();
    }

    for &num in arr.iter() {
        // Calculate bucket index
        let idx = ((num - min_val) / range_val * (bucket_count - 1) as f64) as usize;
        buckets[idx].push(num);
    }

    // Sort each bucket and concatenate
    let mut result = Vec::new();
    for mut bucket in buckets {
        bucket.sort_by(|a, b| a.partial_cmp(b).unwrap());
        result.extend(bucket);
    }

    result
}

fn main() {
    let mut arr = vec![0.42, 0.32, 0.23, 0.52, 0.25, 0.47, 0.51];
    let sorted = bucket_sort(&mut arr);
    println!("{:?}", sorted);
}

func bucketSort(arr []float64) []float64 {
    if len(arr) == 0 {
        return arr
    }

    n := len(arr)

    // Find range
    minVal, maxVal := arr[0], arr[0]
    for _, v := range arr {
        if v < minVal {
            minVal = v
        }
        if v > maxVal {
            maxVal = v
        }
    }

    // Create buckets
    buckets := make([][]float64, n)
    for i := range buckets {
        buckets[i] = []float64{}
    }

    // Distribute
    rangeVal := maxVal - minVal
    if rangeVal == 0 {
        return arr
    }

    for _, num := range arr {
        idx := int((num - minVal) / rangeVal * float64(n-1))
        buckets[idx] = append(buckets[idx], num)
    }

    // Sort and concatenate
    result := []float64{}
    for _, bucket := range buckets {
        sort.Float64s(bucket)
        result = append(result, bucket...)
    }

    return result
}

Time Complexity

Best/Average Case: O(n + k) where k is the number of buckets

When elements are uniformly distributed:

Distribution: O(n)
Sorting buckets: O(n/k) per bucket with k buckets
Concatenation: O(n)
Total: O(n + k)

If k ≈ n, this is O(n) — linear time!

Worst Case: O(n²)

When all elements end up in one bucket, we fall back to the bucket's sorting algorithm (usually O(n²) for small sorts like insertion sort).

Worst case: [0.11, 0.12, 0.13, 0.14, 0.15]
All go to bucket 0 → O(n²) insertion sort

Distribution Matters

Bucket sort's performance depends on uniform distribution. If you know the data distribution, you can design bucket boundaries to ensure even distribution.

Integer Bucket Sort

For integers in a known range:

fn bucket_sort_integers(arr: &[i32], bucket_size: i32) -> Vec<i32> {
    if arr.is_empty() {
        return arr.to_vec();
    }

    let min_val = *arr.iter().min().unwrap();
    let max_val = *arr.iter().max().unwrap();

    // Create buckets
    let bucket_count = ((max_val - min_val) / bucket_size + 1) as usize;
    let mut buckets: Vec<Vec<i32>> = vec![Vec::new(); bucket_count];

    // Distribute
    for &num in arr {
        let idx = ((num - min_val) / bucket_size) as usize;
        buckets[idx].push(num);
    }

    // Sort and concatenate
    let mut result = Vec::new();
    for mut bucket in buckets {
        bucket.sort();
        result.extend(bucket);
    }

    result
}

fn main() {
    let arr = vec![29, 25, 3, 49, 9, 37, 21, 43];
    let sorted = bucket_sort_integers(&arr, 5);
    println!("{:?}", sorted);
}

Counting Sort: Special Case

When elements are integers in a small range, counting sort is bucket sort with bucket size 1:

fn counting_sort(arr: &[i32]) -> Vec<i32> {
    if arr.is_empty() {
        return arr.to_vec();
    }

    let min_val = *arr.iter().min().unwrap();
    let max_val = *arr.iter().max().unwrap();
    let range_size = (max_val - min_val + 1) as usize;

    // Count occurrences
    let mut count = vec![0; range_size];
    for &num in arr {
        count[(num - min_val) as usize] += 1;
    }

    // Reconstruct sorted array
    let mut result = Vec::new();
    for (i, &c) in count.iter().enumerate() {
        let value = i as i32 + min_val;
        result.extend(vec![value; c]);
    }

    result
}

fn main() {
    let arr = vec![4, 2, 2, 8, 3, 3, 1];
    let sorted = counting_sort(&arr);
    println!("{:?}", sorted);  // [1, 2, 2, 3, 3, 4, 8]
}

Time: O(n + k) where k is the range of values. Space: O(k)

Radix Sort: Extension

Radix sort applies counting/bucket sort digit by digit:

fn radix_sort(arr: &mut Vec<i32>) {
    if arr.is_empty() {
        return;
    }

    let max_val = *arr.iter().max().unwrap();
    let mut exp = 1;

    while max_val / exp > 0 {
        counting_sort_by_digit(arr, exp);
        exp *= 10;
    }
}

fn counting_sort_by_digit(arr: &mut Vec<i32>, exp: i32) {
    let n = arr.len();
    let mut output = vec![0; n];
    let mut count = vec![0; 10];

    for &num in arr.iter() {
        let digit = ((num / exp) % 10) as usize;
        count[digit] += 1;
    }

    for i in 1..10 {
        count[i] += count[i - 1];
    }

    for i in (0..n).rev() {
        let digit = ((arr[i] / exp) % 10) as usize;
        output[count[digit] - 1] = arr[i];
        count[digit] -= 1;
    }

    arr.copy_from_slice(&output);
}

fn main() {
    let mut arr = vec![170, 45, 75, 90, 802, 24, 2, 66];
    radix_sort(&mut arr);
    println!("{:?}", arr);  // [2, 24, 45, 66, 75, 90, 170, 802]
}

Time: O(d × (n + k)) where d is the number of digits and k is the base (10).

Comparison of Distribution Sorts

| Sort | Time | Space | When to Use | |------|------|-------|-------------| | Bucket | O(n + k) | O(n + k) | Uniformly distributed data | | Counting | O(n + k) | O(k) | Small integer range | | Radix | O(d(n + k)) | O(n + k) | Fixed-length integers/strings |

Properties

| Property | Bucket Sort | |----------|-------------| | Time (avg) | O(n + k) | | Time (worst) | O(n²) | | Space | O(n + k) | | Stable | Depends on bucket sort | | In-place | No | | Comparison-based | No |

Not Comparison-Based

Bucket sort circumvents the O(n log n) lower bound for comparison sorts because it uses the values directly, not comparisons. This is why it can achieve O(n) time, but it requires knowing the data distribution.

When to Use Bucket Sort

Good for:

Floating-point numbers uniformly distributed in a range
Data with known distribution
When you can design good bucket boundaries
Large datasets where O(n) beats O(n log n)

Not good for:

Unknown or skewed distributions
When extra space is limited
General-purpose sorting

Practical Example: Sorting Exam Scores

fn sort_exam_scores(scores: &[i32]) -> Vec<i32> {
    // 10 buckets: 0-9, 10-19, ..., 90-100
    let mut buckets: Vec<Vec<i32>> = vec![Vec::new(); 11];

    for &score in scores {
        let idx = (score / 10).min(10) as usize;  // 100 goes to bucket 10
        buckets[idx].push(score);
    }

    let mut result = Vec::new();
    for mut bucket in buckets {
        bucket.sort();
        result.extend(bucket);
    }

    result
}

fn main() {
    let scores = vec![72, 85, 63, 91, 45, 78, 88, 56, 92, 73, 67, 81];
    let sorted = sort_exam_scores(&scores);
    println!("{:?}", sorted);
    // [45, 56, 63, 67, 72, 73, 78, 81, 85, 88, 91, 92]
}

Key Takeaways

Bucket sort distributes elements into buckets, sorts each, and concatenates
O(n + k) average when data is uniformly distributed
Worst case O(n²) when all elements go to one bucket
Not comparison-based—can beat O(n log n) lower bound
Counting sort is bucket sort with bucket size 1 for integers
Radix sort extends this idea to multiple digits
Best when you understand your data's distribution

Bucket sort demonstrates that knowing your data can lead to better algorithms. When elements are uniformly distributed, bucket sort's linear time performance is unbeatable.

Quick Sort

Binary Search on Arrays