Two Heaps Explained (C++)

Two Heaps Explained (C++)

Introduction

The 'Two Heaps' is a technique used to solve coding problems that involve finding the median, kth smallest/largest element, or dealing with a stream of numbers where you need to maintain some order. This utilizes two heaps: a Min-Heap and a Max-Heap.

When to Use the Two Heaps Pattern

Here are some scenarios where the Two Heaps is useful:

  • Problems involving finding the median of a stream of numbers.
  • Problems where you need to find the kth smallest or largest element in an array or stream.
  • Problems where you need to divide a set of numbers into two parts with certain properties.
  • Problems that require maintaining a sorted order of elements dynamically.

Core Idea

The main idea behind the Two Heaps is to divide the data into two halves.

  • Max-Heap: Stores the smaller half of the data.
  • Min-Heap: Stores the larger half of the data.

By doing this, we can efficiently find the median or kth element.

Basic Template

Here's a general structure for implementing the Two Heaps in C++:


#include <iostream>
#include <queue>
#include <vector>
#include <functional> // Required for std::greater and std::less

using namespace std;

class TwoHeaps {
public:
    //  Max-Heap to store the smaller half.
    priority_queue<int> maxHeap;
    //  Min-Heap to store the larger half.
    priority_queue<int, vector<int>, greater<int>> minHeap;

    // Function to add a number to the heaps
    void addNumber(int num) {
        if (maxHeap.empty() || num <= maxHeap.top()) {
            maxHeap.push(num);
        } else {
            minHeap.push(num);
        }
        rebalanceHeaps();
    }

    // Function to rebalance the heaps
    void rebalanceHeaps() {
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.push(maxHeap.top());
            maxHeap.pop();
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.push(minHeap.top());
            minHeap.pop();
        }
    }

    // Function to find the median
    double findMedian() {
        if (maxHeap.empty() && minHeap.empty()) {
            return 0.0; // Or throw an exception: "No numbers have been added."
        }
        if (maxHeap.size() == minHeap.size()) {
            return (maxHeap.top() + minHeap.top()) / 2.0;
        } else {
            return maxHeap.top();
        }
    }
};

int main() {
    TwoHeaps twoHeaps;
    twoHeaps.addNumber(3);
    cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 3
    twoHeaps.addNumber(1);
    cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 2
    twoHeaps.addNumber(2);
    cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 2
    twoHeaps.addNumber(4);
    cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 2.5
    twoHeaps.addNumber(5);
    cout << "Median: " << twoHeaps.findMedian() << endl; // Output: 3
    return 0;
}

Explanation:

  1. Initialization:
    • maxHeap: A priority queue (max-heap) to store the smaller half of the numbers. In C++, priority_queue<int> is a max-heap by default.
    • minHeap: A priority queue (min-heap) to store the larger half of the numbers. We use priority_queue<int, vector<int>, greater<int>> to make it a min-heap. greater<int> is a comparator that makes the smallest element have the highest priority.
  2. addNumber(int num):
    • Adds a new number num to the appropriate heap.
    • If the maxHeap is empty or the new number is less than or equal to the top of the maxHeap, add it to the maxHeap.
    • Otherwise, add it to the minHeap.
    • Calls rebalanceHeaps() to maintain the balance.
  3. rebalanceHeaps():
    • Ensures that the heaps are balanced, meaning the difference in their sizes is not more than 1.
    • If the maxHeap has more than one extra element, move the top element from maxHeap to minHeap.
    • If the minHeap has more elements, move the top element from minHeap to maxHeap.
  4. findMedian():
    • Calculates the median based on the elements in the heaps.
    • If both heaps have the same number of elements, the median is the average of the top elements of the maxHeap and minHeap.
    • If the maxHeap has one more element, the median is the top element of the maxHeap.
    • If both heaps are empty, return 0.

Example Problem: Find the Median of a Data Stream

Problem: Given a stream of numbers, find the median of the numbers seen so far after each new number is added.

C++ Code:


#include <iostream>
#include <queue>
#include <vector>
#include <functional>
#include <iomanip>

using namespace std;

class MedianFinder {
public:
    priority_queue<int> maxHeap;
    priority_queue<int, vector<int>, greater<int>> minHeap;

    void addNum(int num) {
        if (maxHeap.empty() || num <= maxHeap.top()) {
            maxHeap.push(num);
        } else {
            minHeap.push(num);
        }
        rebalanceHeaps();
    }

    void rebalanceHeaps() {
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.push(maxHeap.top());
            maxHeap.pop();
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.push(minHeap.top());
            maxHeap.pop();
        }
    }

    double findMedian() {
        if (maxHeap.empty() && minHeap.empty()) {
            return 0.0;
        }
        if (maxHeap.size() == minHeap.size()) {
            return (maxHeap.top() + minHeap.top()) / 2.0;
        } else {
            return maxHeap.top();
        }
    }
};

int main() {
    MedianFinder medianFinder;
    vector<int> numbers = {5, 2, 8, 10, 1, 7, 4};

    for (int number : numbers) {
        medianFinder.addNum(number);
        cout << "Added " << number << ", Median = " << fixed << setprecision(1) << medianFinder.findMedian() << endl;
    }
    return 0;
}

Key Points

  • The Two Heaps is efficient for maintaining sorted order in dynamic data.
  • Using a max-heap for the smaller half and a min-heap for the larger half allows for quick access to the median or other order statistics.
  • Rebalancing the heaps is crucial for maintaining the desired properties and ensuring the correct median calculation.

You can view the full collection of Two heaps problems and explanations on GitHub:

Visit GitHub Repository