When it comes to handling a dynamic set of numbers, finding the median efficiently is often a challenge that programmers face. The median is defined as the middle value when the numbers are sorted. If there's an even number of observations, the median is usually considered as the average of the two middle numbers. In our context, we'll explore how to maintain this median while processing a continuous stream of numbers.
We will use two heaps (priority queues) to allow us to efficiently find the median as we insert numbers into our dataset:
This dual-heap approach allows us to keep track of the median efficiently:
Here's a simple yet effective Java solution to manage the insertion of numbers and median calculation:
import java.util.Collections; import java.util.PriorityQueue; public class MedianFinder { private PriorityQueue<Integer> maxHeap; // Max-heap for the lower half private PriorityQueue<Integer> minHeap; // Min-heap for the upper half public MedianFinder() { maxHeap = new PriorityQueue<>(Collections.reverseOrder()); // Store smaller half minHeap = new PriorityQueue<>(); // Store larger half } public void addNum(int num) { // Always add to maxHeap first maxHeap.offer(num); // Ensure the largest of maxHeap is less than or equal to the smallest of minHeap if (!minHeap.isEmpty() && maxHeap.peek() > minHeap.peek()) { minHeap.offer(maxHeap.poll()); } // Balance the sizes of the heaps if (maxHeap.size() > minHeap.size() + 1) { minHeap.offer(maxHeap.poll()); } else if (minHeap.size() > maxHeap.size()) { maxHeap.offer(minHeap.poll()); } } public double findMedian() { if (maxHeap.size() > minHeap.size()) { return maxHeap.peek(); } else { return (maxHeap.peek() + minHeap.peek()) / 2.0; } } public static void main(String[] args) { MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); System.out.println(medianFinder.findMedian()); // Outputs 1.0 medianFinder.addNum(2); System.out.println(medianFinder.findMedian()); // Outputs 1.5 medianFinder.addNum(3); System.out.println(medianFinder.findMedian()); // Outputs 2.0 } }
Utilizing heaps allows us to achieve efficient O(log n) time complexity for insertions while keeping the retrieval of the median at O(1). This feature becomes especially useful when processing large streams of data, where performance matters.
Let’s consider we have a stream of numbers: 5, 10, 15, 20, 25.
Utilizing bi-heap data structures to compute the median on a streaming dataset is an efficient and elegant solution, and understanding it can distinguish you as a strong candidate during developer interviews.
15/11/2024 | DSA
08/12/2024 | DSA
15/11/2024 | DSA
23/09/2024 | DSA
06/12/2024 | DSA
06/12/2024 | DSA
08/12/2024 | DSA
13/10/2024 | DSA
06/12/2024 | DSA
23/09/2024 | DSA
06/12/2024 | DSA