Random Thoughts

Views on life

How to handle missing data in time series analysis

Posted by Hemanta Banerjee on March 26, 2010


Saw a post today on the clearpeaks blog on ways to display time series charts with missing data. This is something that all of us run into and I wanted to try out some of the suggestions posted on the blog and actually implement those using XCelsius to see whether they can be done in real world.

So here is the situation. As you can see below I have sales for the various months of the year with data for Mar-June missing. If I do nothing and try to draw a chart it is actually misleading since it does not tell

clip_image002

the viewer that some of the months is missing. It is better if we include all the months and show missing data by a broken chart as shown below.

clip_image002[6]

This is better since at least the viewer of the chart knows that the data is missing for some of the months.

This is of course not ideal since we would like to use the data we have to extrapolate and have a better visualization that shows the trend of sales for the full year including the missing months.

So how is done?

We added an excel formula to calculate a rolling average function… we could have used other functionality and even get the backend database to calculate the missing months.

clip_image002[8]

clip_image002[10]

With this new layout we could draw out the chart, and we do not have missing data… however I feel that we are again misleading the user since there is no way for the user to know that Apr to Jun is made up data.

Ideal would be if we could draw that section of the chart in a different color.

This is something that I thought would be easy but turned out to be a little bit complicated.

In the end I ended up doing a hack, which I am not sure if it is a scalable… In the table on the right, the yellow section is the data section. That has all the data from the database as well as the rolling months. Using that I have created 2 series, 1 which contains the real data (red) and another which contains the calculated months (green). For the calculated months series I have also included the boundary months to make sure that I have a continuous line.

clip_image002[12]

Then I setup 2 series in XCelsius, the 1st pointing to the red portions, and the 2nd pointing to the green section of the spreadsheet.

clip_image002[14]

clip_image004

And I have the final chart I need.

clip_image002[16]

If someone has a better way of doing this using XCelsius would love to hear.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: