Finding the median of a data stream can be a challenging problem, especially when you want to do it efficiently as numbers keep streaming in. Instead of storing all numbers and sorting them every time, we can make use of two heaps: a max-heap and a min-heap. Let’s dive into this elegant solution!
Understanding The Median
The median is the middle number in a sorted list of numbers. Depending on whether we have an odd or even number of elements:
- Odd number of elements: The median is the middle element.
- Even number of elements: The median is the average of the two middle elements.
For instance:
- In the list [3, 1, 2], the median is 2.
- In the list [1, 2, 3, 4], the median is (2 + 3) / 2 = 2.5.
Why Use Heaps?
Heaps are a type of binary tree that satisfy the heap property. They allow us to efficiently retrieve the largest or smallest elements.
We will maintain two heaps:
- Max-Heap: This will store the smaller half of the numbers. In Java, we can use a priority queue with a comparator to create a max-heap.
- Min-Heap: This will store the larger half of the numbers.
Using these two heaps allows us to keep track of the median in O(log n) time for insertion and O(1) time for finding the median.
How Does This Work?
Insertion Logic:
- When a new number arrives, we first compare it with the largest number in the max-heap.
- If it is smaller, we insert it into the max-heap. Otherwise, we insert it into the min-heap.
- After every insertion, we may need to rebalance the heaps to ensure that the max-heap has at most one extra element compared to the min-heap.
Finding the Median:
- If the sizes of both heaps are equal, the median is the average of the roots of both heaps.
- If the max-heap has one more element, the median is the root of the max-heap.
Implementation in Java
Here’s how this can be implemented in Java:
import java.util.Collections; import java.util.PriorityQueue; public class MedianFinder { private PriorityQueue<Integer> maxHeap; // max-heap for the left half private PriorityQueue<Integer> minHeap; // min-heap for the right half /** Initialize your data structure here. */ public MedianFinder() { maxHeap = new PriorityQueue<>(Collections.reverseOrder()); minHeap = new PriorityQueue<>(); } public void addNum(int num) { // Add to max-heap (left half) maxHeap.offer(num); // Ensure maxHeap's largest is <= minHeap's smallest if (!minHeap.isEmpty() && maxHeap.peek() > minHeap.peek()) { minHeap.offer(maxHeap.poll()); } // Balance the sizes: maxHeap can have at most one extra element if (maxHeap.size() > minHeap.size() + 1) { minHeap.offer(maxHeap.poll()); } else if (minHeap.size() > maxHeap.size()) { maxHeap.offer(minHeap.poll()); } } public double findMedian() { if (maxHeap.size() > minHeap.size()) { return maxHeap.peek(); } else { return (maxHeap.peek() + minHeap.peek()) / 2.0; } } }
Explanation of the Code
- Initialization: We create two priority queues, one for the max-heap and one for the min-heap.
- addNum(int num): This method is responsible for adding a new number. It first adds the number to the max-heap and then checks the top elements to ensure the properties of our heaps are maintained.
- findMedian(): This method determines the median based on the size of the heaps. If the max-heap has more elements, return the root of the max-heap. If both heaps are of equal size, return the average of their roots.
Example Usage
Here's a quick demo of how the MedianFinder
class would work:
public class Main { public static void main(String[] args) { MedianFinder finder = new MedianFinder(); finder.addNum(1); System.out.println(finder.findMedian()); // Output: 1.0 finder.addNum(2); System.out.println(finder.findMedian()); // Output: 1.5 finder.addNum(3); System.out.println(finder.findMedian()); // Output: 2.0 } }
Output Explanation
- After adding the first number
1
, the median is1.0
. - After adding
2
, the numbers are[1, 2]
, giving a median of1.5
. - Finally, after adding
3
, the numbers are[1, 2, 3]
, retrieving the median2.0
.
By utilizing heaps, we achieve a highly efficient solution to the data stream median problem. The dual-heap structure effectively allows us to dynamically maintain the median as new numbers come in.
With a solid understanding of heaps and their application, you're well on your way to tackling similar problems in interviews and coding challenges!