Creating a custom HashMap in Java

Mathematical foundation of Maps and collisions — Let’s break it down from a mathematical perspective:

Ø What is a Map in Math?

In mathematics, a map (or function) is a relationship between a set of inputs (domain) and a set of possible outputs (codomain).

Definition:

A function f: A → B maps each element a ∈ A to a unique element b ∈ B.

Ø How HashMap Reflects a Math Map

In Java:

Map<K, V> map = new HashMap<>();

This conceptually mirrors:

f: Key → Value

But unlike pure math, we use hashing to determine where to store/retrieve values, and this leads to collisions.

Ø Hash Function as a Mathematical Function

A hash function is a mathematical function:

h:K→N

Where:

K is the set of all possible keys.
ℕ is the set of natural numbers (usually bounded by array size).

Example:

If you have:

java

hashCode("apple") = 99162322

And bucket array size = 16:

h("apple") = 99162322mod16 = 2

So "apple" gets stored in bucket index 2.

Ø Collisions – The Core Problem

What is a Collision?

Two different keys k1 ≠ k2 map to the same index:

h(k1)modN=h(k2)modN

This violates injectivity (1-1 mapping) in mathematics.

Probability of Collisions: The Birthday Paradox

Let’s say:

You have n items (keys).
HashMap has m buckets (indexes).

The probability P(collision) increases as n grows:

$$P(collision)≈ 1−e^{-\frac{n^2}{(2m)}}$$

Example: If n = 23, m = 365 → there's a 50.7% chance of at least one collision — just like birthdays!

Ø Collision Resolution Techniques (Math-Wise)

A. Chaining (Linked Lists):

Multiple keys in the same bucket → stored as a list:

Bucket[i]=k1,k2,k3...ifh(k1)≡h(k2)≡h(k3)(modN)

B. Open Addressing (Linear Probing):

If collision occurs, find the next available slot:

Index=(h(k)+i)modN

Where i = 0, 1, 2... until empty spot is found.

C. Quadratic Probing:

Index=(h(k)+c1∗i+c2∗i^2)modN

D. Double Hashing:

Index=(h1(k)+i∗h2(k))modN

Ø Perfect Hash Function (Ideal But Rare)

A perfect hash function has no collisions:

$$∀x\neq y⇒h(x)\neq h(y)$$

Hard to achieve in practice unless the key set is known ahead of time.

Ø In math overview

Concept	Math Equivalent	Java Equivalent
Map	f: A → B	Map<K, V>
Hash Function	h: K → N	key.hashCode()
Modulo Operation	h(k) mod N	Index in array
Collision	h(k1) ≡ h(k2) (mod N)	Two keys, same bucket
Resolution	Probing / Chaining	Linked list / Tree / Probing

Let’s go deeper into how HashMap works with math and the underlying logic — including formulas, modulus arithmetic, bitwise tricks, and collision theory.

Ø Hashing – The Math Behind It

Formula:

int hash = key.hashCode();

int index = hash % capacity; // classic formula

Java's HashMap improves this to reduce collisions:

java

int hash = hash(key);

int index = (n - 1) & hash;

Why (n - 1) & hash instead of hash % n?

n must be a power of 2 (like 16, 32, 64...).
Bitwise AND is faster than %.
Ensures uniform distribution across buckets.

Example:

hash = 34 (binary: 100010)

n = 16 (binary: 10000)

n-1 = 15 (binary: 01111)

index = 34 & 15 = 2

Ø Hash Function and Distribution

Let’s say:

key = "Java"

hashCode = key.hashCode(); // say: 2301506

With capacity n = 16:

java

index = 2301506 % 16 = 2

But collisions happen if:

java

2301506 % 16 == 50 % 16 == 2

Ø Multiple keys can land in the same bucket

Ø Collisions

Birthday Paradox Style:

As you insert more keys, the chance of two having the same bucket increases (like two people sharing a birthday in a room of 23+ people).

Mathematically, the probability of collision increases non-linearly as entries increase:

With 16 buckets, inserting 8+ entries → ~50% collision chance.

Ø Load Factor and Rehashing

java

Load Factor (LF) = number of elements / number of buckets

Default = 0.75

When size >= capacity * loadFactor → resize happens

E.g., capacity = 16:

java

16 * 0.75 = 12 → after 12 inserts → rehash to 32

Ø Custom HashMap Recap with Math

Creating a custom HashMap in Java involves implementing the core functionality of a hash-based data structure. Below is a simple implementation of a custom HashMap. This implementation will include basic methods such as put, get, remove, and containsKey. For simplicity, it won't handle all edge cases and optimizations but should give you a foundational understanding of how HashMap works.

Custom HashMap Implementation

java

import java.util.LinkedList;

import java.util.Objects;

public class CustomHashMap<K, V> {

private static final int DEFAULT_CAPACITY = 16;

private static final float LOAD_FACTOR = 0.75f;

private LinkedList<Entry<K, V>>[] table;

private int size;

// Entry class to store key-value pairs

private static class Entry<K, V> {

K key;

V value;

Entry<K, V> next;

Entry(K key, V value, Entry<K, V> next) {

this.key = key;

this.value = value;

this.next = next;

}

@SuppressWarnings("unchecked")

public CustomHashMap(int capacity) {

table = new LinkedList[capacity];

size = 0;

}

public CustomHashMap() {

this(DEFAULT_CAPACITY);

}

// Hash function to determine the index in the table

private int hash(Object key) {

return key.hashCode() % table.length;

}

// Add a key-value pair to the map

public void put(K key, V value) {

int index = hash(key);

if (table[index] == null) {

table[index] = new LinkedList<>();

}

LinkedList<Entry<K, V>> bucket = table[index];

for (Entry<K, V> entry : bucket) {

if (entry.key.equals(key)) {

entry.value = value;

return;

}

bucket.add(new Entry<>(key, value, null));

size++;

// Resize if the load factor is exceeded

if ((float) size / table.length > LOAD_FACTOR) {

resize();

}

// Retrieve the value associated with a key

public V get(K key) {

int index = hash(key);

LinkedList<Entry<K, V>> bucket = table[index];

if (bucket != null) {

for (Entry<K, V> entry : bucket) {

if (entry.key.equals(key)) {

return entry.value;

}

return null;

}

// Remove a key-value pair from the map

public V remove(K key) {

int index = hash(key);

LinkedList<Entry<K, V>> bucket = table[index];

if (bucket != null) {

Entry<K, V> prev = null;

for (Entry<K, V> entry : bucket) {

if (entry.key.equals(key)) {

if (prev != null) {

prev.next = entry.next;

} else {

bucket.remove(entry);

}

size--;

return entry.value;

}

prev = entry;

}

return null;

}

// Check if the map contains a key

public boolean containsKey(K key) {

int index = hash(key);

LinkedList<Entry<K, V>> bucket = table[index];

if (bucket != null) {

for (Entry<K, V> entry : bucket) {

if (entry.key.equals(key)) {

return true;

}

return false;

}

// Resize the table when needed

@SuppressWarnings("unchecked")

private void resize() {

LinkedList<Entry<K, V>>[] oldTable = table;

table = new LinkedList[oldTable.length * 2];

size = 0;

for (LinkedList<Entry<K, V>> bucket : oldTable) {

if (bucket != null) {

for (Entry<K, V> entry : bucket) {

put(entry.key, entry.value);

}

// Get the current size of the map

public int size() {

return size;

}

// Main method for testing

public static void main(String[] args) {

CustomHashMap<String, Integer> map = new CustomHashMap<>();

map.put("One", 1);

map.put("Two", 2);

map.put("Three", 3);

System.out.println("Size: " + map.size()); // Output: Size: 3

System.out.println("Get 'Two': " + map.get("Two")); // Output: Get 'Two': 2

map.remove("Two");

System.out.println("Contains 'Two': " + map.containsKey("Two")); // Output: Contains 'Two': false

System.out.println("Size after removal: " + map.size()); // Output: Size after removal: 2

}

Key Components

Entry Class: Represents a key-value pair with a reference to the next entry in the linked list.
Hash Function: Maps a key to an index in the table using its hash code.
Put Method: Adds a key-value pair to the map. Resizes the table if the load factor is exceeded.
Get Method: Retrieves the value associated with a key.
Remove Method: Removes a key-value pair from the map and adjusts the linked list accordingly.
ContainsKey Method: Checks if the map contains a specific key.
Resize Method: Doubles the size of the table and rehashes all entries when the load factor is exceeded.
Main Method: Tests the basic functionality of the CustomHashMap.

Ø Summary of the Math Concepts

Concept	Description	Math Involved
Hashing	Convert key → int → bucket index	hash % n, (n-1) & hash
Load Factor	Threshold to resize	loadFactor = size / capacity
Resizing	Double the capacity, redistribute entries	newIndex = hash(key) % newCapacity
Collision Probability	More items = more chance of same bucket	Similar to Birthday Problem

This custom implementation provides a fundamental structure of a hash map with dynamic resizing. It can be expanded with additional features or optimizations as needed.