How does a hash table work?

A hash table (or associative array, dictionary) is a data structure that stores key-value pairs and provides quick access to a value by its key.

Main components and working principle:

Array (Buckets): The main data storage, represented as an array of a certain size. Each element of this array is called a "bucket" or "slot".
Hash function: Converts a key into a numerical value (hash code). A good hash function distributes keys evenly across hash codes.
Indexing: The hash code obtained from the hash function is used to determine the index in the array where the value corresponding to the key will be stored or retrieved. Often, the operation modulo (hash_code % array_size) is used to get an index in the range from 0 to array_size - 1.
Collision resolution: Since different keys can have the same hash code (hash collision), the hash table must have a mechanism to resolve them. Two main approaches:
- Chaining method: Each bucket stores a linked list (or another data structure, such as a binary tree) containing all key-value pairs that hash to this bucket.
- Open addressing method: In case of collision, the next free bucket in the array is searched using various strategies (linear probing, quadratic probing, double hashing).

Insert process (put):

Compute the hash code of the key.
Calculate the bucket index based on the hash code.
Place the key-value pair into the corresponding bucket. If chaining is used, add it to the list in the bucket. If open addressing is used, find a free spot.

Retrieve process (get):

Compute the hash code and bucket index for the key.
Access the corresponding bucket.
If chaining is used, search for the key-value pair in the list. If open addressing is used, search in the array starting from the computed index, using the same probing strategy as during insertion.
Compare keys (since different keys can have the same hash code) to find the correct pair.

Deletion process (delete):

Similar to retrieval: compute hash code and index.
Find the key-value pair.
Remove the pair. For open addressing, "lazy deletion" (marking the slot as deleted) is usually used to not break the search sequence during retrieval.

Conceptual example:

Suppose we have a small array of 5 buckets, and we use chaining.

Key	Hash function	Index (hash % 5)	Bucket
"apple"	$\approx 110$	$110 % 5 = 0$	Bucket 0: [("apple", "red")]
"banana"	$\approx 221$	$221 % 5 = 1$	Bucket 1: [("banana", "yellow")]
"cherry"	$\approx 332$	$332 % 5 = 2$	Bucket 2: [("cherry", "red")]
"date"	$\approx 115$	$115 % 5 = 0$	Bucket 0: [("apple", "red"), ("date", "brown")]

When searching for "date", the hash function gives a hash code $\approx 115$, index 0. Access Bucket 0 and scan the list until the key "date" is found.

The efficiency of hash tables heavily depends on the quality of the hash function and the load factor (the ratio of the number of elements to the size of the array). In an ideal case (no collisions), insert, retrieve, and delete operations are performed on average in $O(1)$ time. In the worst case (all keys hash to the same bucket), the time can degrade to $O(n)$, where $n$ is the number of elements.

In Python, the hash table is implemented in the dict data type.

# Creating a dictionary
my_dict = {"apple": "red", "banana": "yellow"}

# Inserting
my_dict["cherry"] = "red"

# Retrieving
color_of_banana = my_dict["banana"]

# Deleting
del my_dict["apple"]

# Checking if a key exists
if "date" in my_dict:
    print("date is in the dictionary")
else:
    print("date is not in the dictionary")

# Hashing principle (Python uses its own implementation)
# hash("apple") # Returns an integer hash code for the string "apple"
# hash(100)     # Returns an integer hash code for the integer 100
# Immutable objects (strings, numbers, tuples) are hashable and can be used as dictionary keys.
# Mutable objects (lists, dictionaries, sets) are not hashable by default.