# Hash tables

A hash table is a data structure that maps keys to values, using a hash function to correlate key values with the index (or bucket) where the value is stored in an underlying array . A hash table is an implementation of the map abstract data type.

Operation Average Worst
Access $\htmlClass{text-green}{\Theta(1)}$ $\htmlClass{text-yellow}{\mathcal{O}(n)}$
Insert $\htmlClass{text-green}{\Theta(1)}$ $\htmlClass{text-yellow}{\mathcal{O}(n)}$
Delete $\htmlClass{text-green}{\Theta(1)}$ $\htmlClass{text-yellow}{\mathcal{O}(n)}$

## Hash functions

A hash function maps each key to an integer index of a bucket array.

Division hashing uses the modulo function by dividing the key $k$ by the number of buckets $b$, with the remainder serving as the bucket index: $i = k \mod b$. (original)

There are many other hash functions in use, each with unique benefits and disadvantages.

## Collision handling

A hash collision occurs when the hash function maps multiple keys to the same bucket index.

In a hash table with no collisions, access, insertion, and deletion can all be performed in $\mathcal{O}(1)$ time.

The load factor of a hash table is the ratio of stored items ($n$) to array buckets ($b$), $n \over {b}$. To minimize collisions, the load factor of a hash table should never exceed $1$ (an upper bound is usually set between $0.5$ and $0.75$). When the specified upper bound is reached, a rehash is performed, with the underlying array being resized and existing values distributed to their new corresponding buckets.

A hash function should also be chosen that evenly distributes keys to buckets, avoiding clustering values. Even with a load factor far below $1$, if the hash function distributes every key to the same bucket (the worst-case scenario), the performance of every operation will $\mathcal{O}(n)$.

### Chaining

In chaining, collisions are handled by appending colliding key-value pairs to a linked list stored in the bucket. To access a value by its key, the hash function resolves the correct bucket, and the list is searched for the correct key-value pair.

In this example, the hash table, which implements a map, is bootstrapped from a simpler map implementation, an association list (the LinkedMap class):

 1public abstract class HashTable<K,V> implements MapADT<K,V> {
2    protected static int DEFAULT_CAPACITY = 17;
3
4    protected int size = 0;
5
6    protected int hash(Object key) {
7        // ensure a positive hash value by removing the sign bit from Java's hash,
8        // so that later application of the modulo operator results in a value > 0
9        return key.hashCode() & 0x7fffffff;
10    }
11
12    public int size() { return size; }
13
14    public boolean isEmpty() { return size == 0; }
15}

 1public class ChainingHashTable<K,V> extends HashTable<K,V> {
3
4    public ChainingHashTable() { this(DEFAULT_CAPACITY); }
5
6    public ChainingHashTable(int capacity) {
8        for (int i = 0; i < capacity; i++)
10    }
11
12    public V get(K key) {
13        int index = hash(key) % table.length;
14        return table[index].get(key);
15    }
16
17    public V set(K key, V value) {
18        int index = hash(key) % table.length;
19        V result = table[index].set(key, value);
20        if (result == null) size++;
21        if (loadFactor() > 0.75) resize(table.length * 2);
22        return result;
23    }
24
25    public V remove(K key) {
26        int index = hash(key) % table.length;
27        V result = table[index].remove(key);
28        if (result != null) size--;
29        return result;
30    }
31
32    public void forEach(BiConsumer<K,V> consumer) {
33        for (int i = 0; i < size; i++)
34            table[i].forEach(consumer);
35    }
36
37    private LinkedMap<K,V>[] createTable(int capacity) {
39        for (int i = 0; i < capacity; i++)
41        return newTable;
42    }
43
44    private float loadFactor() { return (float) size / table.length; }
45
46    private void resize(int capacity) {
47        ChainingHashTable<K,V> resized = new ChainingHashTable<>(capacity);
48        forEach(resized::set);
49        table = resized.table;
50    }
51}


In open addressing, each key-value pair is directly stored in a bucket, but when a collision occurs, a method is used to place the pair in another bucket such that it can be found later.

#### Linear probing

Linear probing is an open addressing method whereby a colliding key value pair is stored by iterating forwards in the array until an unoccupied bucket is found. When searching for a key, the buckets are scanned in the same order until either the matching key or an empty bucket (key doesn’t exist) is found.

 1public class LinearProbingHashTable<K,V> extends HashTable<K,V> {
2    private static class Bucket<K,V> {
3        private final K key;
4        private V value;
5
6        public Bucket(K key, V value) {
7            this.key = key;
8            this.value = value;
9        }
10    }
11
12    private Bucket<K,V>[] table;
13    private final Bucket<K,V> REMOVED = new Bucket<>(null, null);
14
15    public LinearProbingHashTable() { this(DEFAULT_CAPACITY); }
16
17    public LinearProbingHashTable(int capacity) {
18        table = (Bucket<K,V>[]) new Bucket[capacity];
19    }
20
21    public V get(K key) {
22        int start = hash(key) % table.length;
23        for (int i = 0; i < table.length; i++) {
24            int j = (start + i) % table.length;
25            var bucket = table[j];
26            if (bucket != null && bucket != REMOVED && bucket.key.equals(key)) {
27                return bucket.value;
28            }
29        }
30        return null;
31    }
32
33    public V set(K key, V value) {
34        if (loadFactor() >= 0.5) resize(table.length * 2);
35
36        int start = hash(key) % table.length;
37        int first = -1;
38        for (int i = 0; i < table.length; i++) {
39            int j = (start + i) % table.length;
40            var bucket = table[j];
41            if (bucket == null) {
42                // we can assume the key doesn't exist, and insert immediately
43                table[j] = new Bucket<>(key, value);
44                size++;
45                return null;
46            }
47            if (bucket.equals(REMOVED)) {
48                // the key may exist, but save this index for later insertion if needed
49                if (first == -1) first = j;
50            }
51            if (bucket.key.equals(key)) {
52                // the key exists, replace the value
53                var result = bucket.value;
54                bucket.value = value;
55                return result;
56            }
57        }
58        table[first] = new Bucket<>(key, value);
59        size++;
60        return null;
61    }
62
63    public V remove(K key) {
64        int start = hash(key) % table.length;
65        for (int i = 0; i < table.length; i++) {
66            int j = (start + i) % table.length;
67            var bucket = table[j];
68            if (bucket != null && bucket != REMOVED && bucket.key.equals(key)) {
69                V result = bucket.value;
70                table[j] = REMOVED;
71                size--;
72                return result;
73            }
74        }
75        return null;
76    }
77
78    public void forEach(BiConsumer<K, V> consumer) {
79        for (Bucket<K,V> bucket : table)
80            if (bucket != null && bucket != REMOVED)
81                consumer.accept(bucket.key, bucket.value);
82    }
83
84    private float loadFactor() { return (float) size / table.length; }
85
86    private void resize(int capacity) {
87        LinearProbingHashTable<K,V> resized = new LinearProbingHashTable<>(capacity);
88        forEach(resized::set);
89        table = resized.table;
90    }
91}