08/04/2017

[Java] Rolling average for both sorted and unsorted streams

The rolling, moving, running, whatever movement verb you like median or average is a problem regarding the online calculation of the average element in a streaming dataset.


The pro version requires you to sort the data on the fly in order to produce the median result:

input = 9, 4, 10, 3, 3
output = 4 because sorted input is: 3, 3, 4, 9, 10

The base version just wants the actual median element of the stream:

input = 9, 4, 10, 3, 3
output = 10

If the stream length is odd, the median is the actual median element, otherwise it's the average of the two elements near the middle of the stream.

The solution to both returns the median in constant O(1) time and keeps it updated in O(N) time. Also they both use additional O(N) space.


For the base version the idea is to keep track of the the two values needed to calculate the median in the even case, plus a Queue holding the values that follow in the exact order as they have been received.

For the pro version the idea is to view the dataset as:

elements smaller than median | median | elements bigger than median

Both sides have to be kept sorted in ascending order. For the left side we want to retrieve the MAX element fast (max heap) and for the right side we want to retrieve the MIN element fast (min heap) instead.

You can check the implementation on my Gist with some test cases. Language is Java 8.

No comments:

Post a Comment

With great power comes great responsibility