Random Thoughts

Views on life

Archive for the ‘Universe’ Category

BOE Universe: Generating list of values based on data

Posted by Hemanta Banerjee on November 19, 2010


Let us take a simple scenario. Let us say we have a countries dimension which has all the countries. If we want to use this dimension table to get a list of countries where we have customers, as well as use it to get a list of countries where we have offices we can create 2 aliases COUNTRY_OFFICE and COUNTRY_CUSTOMER and create objects from the aliases. No issues till now, except if we try to get a list of countries where we have offices. Since we have used the master country table it will list all the countries irrespective of whether we have office there or not.

Let us see how it works below.

image

In the example above I have a country table and I use it for both client and showroom country. Now if I query for showroom country i.e. countries where I have showrooms here is the query produced.

image

As you can see it lists all the countries in the countries table which is not what we wanted. In order to get the correct list of countries I have to join with the SHOWROOM table so that the countries list is restricted based on the SHOWROOM dimension table. This is done by specifying that whenever the Showroom Country object is used in a query, the Showroom table must also be inferred in the FROM clause of the SELECT statement. Providing that the Showroom_Country table is joined to the Showroom table the object is then guaranteed to only return countries in which showrooms exist.

image

Making this change ensures that we always get the correct set of countries when we query for showroom countries.

image

While this does not seem that critical it becomes very important especially when we want the user to select the showroom country in a prompt for example. We only want those countries to be in the prompt list where we have showrooms and making this change will ensure that we always get the correct list.

Posted in BusinessObjects, Universe, WEBI | Tagged: , | Leave a Comment »

Outer Joins in Universe (BusinessObjects)

Posted by Hemanta Banerjee on November 15, 2010


Lets take a simple scenario. I want to generate a report that shows the sales by showroom, and the report should display all the showrooms in the report. For the showroom with no sales it should display the region with a NULL value for sales.

image

The usual join (also called as inner join) will not work in this scenario. We need what is called as an outer join. If you want to know more about outer joins you can checkout the Wikipedia link here.

To enable outer join you need to first set the ANSI92 parameter to Yes. This will change the query from the simple join to an an inner join syntax with from clause as shown below.

image

You can also enable the FILTER_IN_FROM parameter. This pushes the where clause of the join inside the from which reduces the number of records in the join condition and will greatly improve performance.

image

Now we can setup our outer join. As shown below we can setup the right outer join betwen the fact table and the showroom dimension table.

image

The effect of this is that all showrooms irrespective of whether they had a sale or not will be returned by the query.

image

So in conclusion while it is easy to setup outer joins in the universe, one should be very careful when using outer joins as it can result in a cartesian product of all rows especially when using full outer join.

Posted in BusinessObjects, Universe | Tagged: , , , | 1 Comment »

Managing performance by using Aggregate tables in the universe

Posted by Hemanta Banerjee on November 9, 2010


In most large data warehouses one of the common strategies employed by DBA’s to speed up performance is to use aggregate tables. Generally aggregate tables contain information that has a coarser granularity than the detail data. For example in a retail datamart I might have information at the transaction level. However most of the analysis will be performed at the daily level by brand. Without aggregate tables the database will fetch the lowest level of data and will perform a group by at the day level for specific brands which can be a very expensive operation. Instead as part of the ETL process I can pre-aggregate the data at the daily level which would reduce the number of rows by a huge factor. The Sales_Receipts fact table would contain this detail data, but the records in that table might also be aggregated over various time periods to produce a set of aggregate tables (Sales_Daily, Sales_Monthly, and so on).

There are multiple ways of managing aggregates. One option is to create aggregate tables in the database as materialized views and let the query optimizer of the database handle the performance using seamless query rewrite. I will describe this in a separate post. In this post I will focus on using the aggregate awareness functionality of the universe.

My sample database tracks the sales of cars. My detailed fact table VW_SALE_MODEL is used to track the sales at the lowest level of detail i.e. client, showroom, model and color.

image

Using these I can create a family of aggregate fact tables. For example I have created a aggregate table called VW_SALE that aggregates the data at the client and showroom level. Similarly I can create additional aggregates at the year level.

image

Once I have create the aggregates I need to map them into the universe which is a 4 step process.

1. Add the aggregate tables and setup the joins with the dimension tables. I have not created standalone aggregate tables since I want to make sure that I can leverage the hierarchies defined in the dimension tables. This process is similar to adding any fact table in the universe.

image

2. Define the aggregated measures using the @aggregate_aware function. The @aggregate_aware function is used to setup aggregate awareness in the universe.

The syntax of the @Aggregate_Aware function is @Aggregate_Aware(sum(agg_table_1), sum(agg_table 2) …., sum(agg_table_n)) in the order of preference. For example agg_table1 should be the highest level aggregate, followed by agg_table 2 and so on. This is used by the universe to pick the best aggregate table to answer the query.

image

In my example I have stated that either try to get the sales total from the aggregate table or get it by calculating it using the detailed table.

3. Define the incompatibilities. For example in my structure the model and maker classes are not captured by the aggregate. Also the aggregate table only contains information about sales and not about the quantity sold. We need to define these incompatibilities so that when the user generates a query, the universe can quickly scan through the compatibility list to determine the best aggregate that can be used to answer the query.

image

When I define it as shown above all queries that include Model or Maker will go to the detailed table. All other queries will be satisfied by the aggregate table.

4. Resolve any loops. Since I have joined the dimension tables to both the fact I have created some loops in the universe which I need to resolve. I can do that by creating separate contexts for the aggregate fact and the detailed facts.

Now I am ready to using my aggregates. To illustrate let us go to WEBI and see the impact of our design. When I query for sales by showroom the entire query is answered by the aggregate table.

image

As soon as I add the maker to the query BO now retrieves the data from the detailed table instead.

image

The same approach can be used to capture additional aggregates such as Year level or Qtr level and BO will dynamically go from using the summary table to using the detailed table as the user is performing the drill down.

So in summary, aggregate tables are very powerful and necessary in most real implementations and BO provides a fairly simple way to model it within the universe.

Posted in Aggregate Awareness, BusinessObjects, Universe | Tagged: , , , | Leave a Comment »

How to perform YTD (or any Period to date) design in the Universe

Posted by Hemanta Banerjee on November 5, 2010


Yet another post inspired by the BOBJ board. The idea is how to design a universe such that users could enter any date and get both the measure value for that period as well as YTD. Since we want to make it easy for adhoc users we need to do some design work in the universe to make it easy for the users performing adhoc analysis.

So I figured the easiest approach would be to define a separate set of measures for YTD similar to what I did for the YAGO computation in a previous post. So extending on the same example I followed a very similar approach and it turns out to be quite simple. All we need to do is make sure we are able to run multiple queries, once for getting the sales and another one that sums up the sales from the beginning of year to the selected date. So I know we have to define a separate context for the YTD sales, forcing the BI Server to automatically run 2 queries and join the results. That’s what I want to leverage.

1. To make my life easier in the universe I defined a separate reference table DATES_PERIOD that maps the date to its corresponding YTD start and end dates. This not only makes it simple, it also makes it possible for me use the same design for handling non standard calendars such as Fiscal calendar. Also if I want to do QTD or MTD instead of YTD I can use the same approach by just changing the start and end dates.

image

In this table I have gone ahead and filled up the start and end dates for YTD for every date in the DATES_TABLE my calendar table.

2.  In my universe I first go ahead and define an alias for the fact table called YTD_SALES. Now instead of joining it to my DATES_PERIOD (date dimension) table I have joined it to my DATES_PERIOD table using a complex join as shown below.

image

This ensures that I will always select all the sales from the time slice (start and end of YTD) rather than selecting a specific date. My universe is as shown below.

image

In my universe my time dimension objects such as Date or Qtr are driven by the DATES_TABLE. So in order to tie everything up I have joined the DATES_PERIOD to the DATES_TABLE on the date. This ensures that when the user selects the date, the corresponding period will be selected from the DATES_PERIOD and the BI Server will return the sales that fall in that period. This is the key part of the design.

Now I can setup up the rest of the joins with the rest of the dimension tables.

3. Now I have to define a new context for YTD as shown below. This is needed to make sure that when the user selects from Sales and YTD sales they are sent as separate queries.

image

After setting up the contexts I can define the YTD Sales revenue by pulling in the appropriate field from the YTD_SALES alias table.

image

Now checking to make sure that the logic is OK. I define 2 queries, YTD Sales till Dec 31-2004 and the sales for 2004. If the logic is correct both should come out same and it does.

image

Also I can pull them in the same query if I want. I know that there is a sale on 15-Mar-2004. Filtering on that date gives me both the sales value for that date as well as YTD sales.

image

The reason I love this design is because very versatile and it can be used for any period to date. The only thing to note is that it will work only if the user selects a date. If the user selects a Qtr or Month then the YTD value will be garbage. If you want to prevent this then you can force the user to select a date using a prompt.

Posted in BusinessObjects, Period to Date Functions, Time Sliced, Universe, WEBI, YTD | Tagged: , , , , , , , | 1 Comment »