In most large data warehouses one of the common strategies employed by DBA’s to speed up performance is to use aggregate tables. Generally aggregate tables contain information that has a coarser granularity than the detail data. For example in a retail datamart I might have information at the transaction level. However most of the analysis will be performed at the daily level by brand. Without aggregate tables the database will fetch the lowest level of data and will perform a group by at the day level for specific brands which can be a very expensive operation. Instead as part of the ETL process I can pre-aggregate the data at the daily level which would reduce the number of rows by a huge factor. The Sales_Receipts fact table would contain this detail data, but the records in that table might also be aggregated over various time periods to produce a set of aggregate tables (Sales_Daily, Sales_Monthly, and so on).
There are multiple ways of managing aggregates. One option is to create aggregate tables in the database as materialized views and let the query optimizer of the database handle the performance using seamless query rewrite. I will describe this in a separate post. In this post I will focus on using the aggregate awareness functionality of the universe.
My sample database tracks the sales of cars. My detailed fact table VW_SALE_MODEL is used to track the sales at the lowest level of detail i.e. client, showroom, model and color.
Using these I can create a family of aggregate fact tables. For example I have created a aggregate table called VW_SALE that aggregates the data at the client and showroom level. Similarly I can create additional aggregates at the year level.
Once I have create the aggregates I need to map them into the universe which is a 4 step process.
1. Add the aggregate tables and setup the joins with the dimension tables. I have not created standalone aggregate tables since I want to make sure that I can leverage the hierarchies defined in the dimension tables. This process is similar to adding any fact table in the universe.
2. Define the aggregated measures using the @aggregate_aware function. The @aggregate_aware function is used to setup aggregate awareness in the universe.
The syntax of the @Aggregate_Aware function is @Aggregate_Aware(sum(agg_table_1), sum(agg_table 2) …., sum(agg_table_n)) in the order of preference. For example agg_table1 should be the highest level aggregate, followed by agg_table 2 and so on. This is used by the universe to pick the best aggregate table to answer the query.
In my example I have stated that either try to get the sales total from the aggregate table or get it by calculating it using the detailed table.
3. Define the incompatibilities. For example in my structure the model and maker classes are not captured by the aggregate. Also the aggregate table only contains information about sales and not about the quantity sold. We need to define these incompatibilities so that when the user generates a query, the universe can quickly scan through the compatibility list to determine the best aggregate that can be used to answer the query.
When I define it as shown above all queries that include Model or Maker will go to the detailed table. All other queries will be satisfied by the aggregate table.
4. Resolve any loops. Since I have joined the dimension tables to both the fact I have created some loops in the universe which I need to resolve. I can do that by creating separate contexts for the aggregate fact and the detailed facts.
Now I am ready to using my aggregates. To illustrate let us go to WEBI and see the impact of our design. When I query for sales by showroom the entire query is answered by the aggregate table.
As soon as I add the maker to the query BO now retrieves the data from the detailed table instead.
The same approach can be used to capture additional aggregates such as Year level or Qtr level and BO will dynamically go from using the summary table to using the detailed table as the user is performing the drill down.
So in summary, aggregate tables are very powerful and necessary in most real implementations and BO provides a fairly simple way to model it within the universe.