Two Heaps Explained (C++)
Introduction
The 'Two Heaps' is a technique used to solve coding problems that involve finding the median, kth smallest/largest element, or dealing with a stream of numbers where you need to maintain some order. This utilizes two heaps: a Min-Heap and a Max-Heap.
When to Use the Two Heaps Pattern
Here are some scenarios where the Two Heaps is useful:
- Problems involving finding the median of a stream of numbers.
- Problems where you need to find the kth smallest or largest element in an array or stream.
- Problems where you need to divide a set of numbers into two parts with certain properties.
- Problems that require maintaining a sorted order of elements dynamically.
Core Idea
The main idea behind the Two Heaps is to divide the data into two halves.
- Max-Heap: Stores the smaller half of the data.
- Min-Heap: Stores the larger half of the data.
By doing this, we can efficiently find the median or kth element.
Basic Template
Here's a general structure for implementing the Two Heaps in C++:
#include <iostream>
#include <queue>
#include <vector>
#include <functional> // Required for std::greater and std::less
using namespace std;
class TwoHeaps {
public:
// Max-Heap to store the smaller half.
priority_queue<int> maxHeap;
// Min-Heap to store the larger half.
priority_queue<int, vector<int>, greater<int>> minHeap;
// Function to add a number to the heaps
void addNumber(int num) {
if (maxHeap.empty() || num <= maxHeap.top()) {
maxHeap.push(num);
} else {
minHeap.push(num);
}
rebalanceHeaps();
}
// Function to rebalance the heaps
void rebalanceHeaps() {
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.push(maxHeap.top());
maxHeap.pop();
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.push(minHeap.top());
minHeap.pop();
}
}
// Function to find the median
double findMedian() {
if (maxHeap.empty() && minHeap.empty()) {
return 0.0; // Or throw an exception: "No numbers have been added."
}
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.top() + minHeap.top()) / 2.0;
} else {
return maxHeap.top();
}
}
};
int main() {
TwoHeaps twoHeaps;
twoHeaps.addNumber(3);
cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 3
twoHeaps.addNumber(1);
cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 2
twoHeaps.addNumber(2);
cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 2
twoHeaps.addNumber(4);
cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 2.5
twoHeaps.addNumber(5);
cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 3
return 0;
}
Explanation:
- Initialization:
- maxHeap: A priority queue (max-heap) to store the smaller half of the numbers. In C++, priority_queue<int> is a max-heap by default.
- minHeap: A priority queue (min-heap) to store the larger half of the numbers. We use priority_queue<int, vector<int>, greater<int>> to make it a min-heap. greater<int> is a comparator that makes the smallest element have the highest priority.
- addNumber(int num):
- Adds a new number num to the appropriate heap.
- If the maxHeap is empty or the new number is less than or equal to the top of the maxHeap, add it to the maxHeap.
- Otherwise, add it to the minHeap.
- Calls rebalanceHeaps() to maintain the balance.
- rebalanceHeaps():
- Ensures that the heaps are balanced, meaning the difference in their sizes is not more than 1.
- If the maxHeap has more than one extra element, move the top element from maxHeap to minHeap.
- If the minHeap has more elements, move the top element from minHeap to maxHeap.
- findMedian():
- Calculates the median based on the elements in the heaps.
- If both heaps have the same number of elements, the median is the average of the top elements of the maxHeap and minHeap.
- If the maxHeap has one more element, the median is the top element of the maxHeap.
- If both heaps are empty, return 0.
Example Problem: Find the Median of a Data Stream
Problem: Given a stream of numbers, find the median of the numbers seen so far after each new number is added.
C++ Code:
#include <iostream>
#include <queue>
#include <vector>
#include <functional>
#include <iomanip>
using namespace std;
class MedianFinder {
public:
priority_queue<int> maxHeap;
priority_queue<int, vector<int>, greater<int>> minHeap;
void addNum(int num) {
if (maxHeap.empty() || num <= maxHeap.top()) {
maxHeap.push(num);
} else {
minHeap.push(num);
}
rebalanceHeaps();
}
void rebalanceHeaps() {
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.push(maxHeap.top());
maxHeap.pop();
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.push(minHeap.top());
maxHeap.pop();
}
}
double findMedian() {
if (maxHeap.empty() && minHeap.empty()) {
return 0.0;
}
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.top() + minHeap.top()) / 2.0;
} else {
return maxHeap.top();
}
}
};
int main() {
MedianFinder medianFinder;
vector<int> numbers = {5, 2, 8, 10, 1, 7, 4};
for (int number : numbers) {
medianFinder.addNum(number);
cout << "Added " << number << ", Median = " << fixed << setprecision(1) << medianFinder.findMedian() << endl;
}
return 0;
}
Key Points
- The Two Heaps is efficient for maintaining sorted order in dynamic data.
- Using a max-heap for the smaller half and a min-heap for the larger half allows for quick access to the median or other order statistics.
- Rebalancing the heaps is crucial for maintaining the desired properties and ensuring the correct median calculation.
You can view the full collection of Two heaps problems and explanations on GitHub:
Visit GitHub Repository