Posted by: zeesql | May 27, 2008

Creating your first data cube

SQL Server 2005/2008 comes along with the Business Intelligence Development Studio (also known as BIDS) which is used for creating projects related to analysis services, integration services, report server and report models (used to design reports on client side). For those of you wondering what projects are, ‘business intelligence development studio projects are a collection of related objects such as data sources, cubes, dimensions, mining structures etc’.  If it sounds confusing, don’t give up and keep reading and you’ll learn.  So, in order to get started with BIDS to build your first analysis services project, you need to have the following SQL Server components installed:

  • SQL Server Database Engine
  • SQL Server Analysis Services (SSAS)
  • Business Intelligence Development Studio
  • AdventureWorks Sample DW (data warehouse)
     

We take a start with creating a new project in BIDS. You’ll find the Business Intelligence Development Studio (BIDS) in Start > Microsoft SQL Server 2005/2008 folder. Once you’re in the business intelligence development studio, click File > New > Project from where you could kick off a new Analysis Services Project. Give it any name you like and click on ok. Based on default layout, you’ll have two windows tiled over each other on your left. One of them is called the Solution Explorer, within which you’ll create objects related to your projects. The other one window is called Properties window where you could view and modify the properties/attributes for the objects and their child objects (or you could just think of them as settings).

Moving on, let’s add a data source. As the name implies, data sources provide a connection between the project and the data. To add a data source, right click on the data sources folder in the solutions explorer window and select new data source which would fire off the new data source wizard. Click next within the wizard and then at the next step on the new button to create a data source. From the list of providers at the top, select ‘Native OLE DB\Microsoft OLE DB Provider for SQL Server’ and then type ‘localhost’ under server name. In the connect to a database group box, select the database named ‘AdventureWorksDW’ using the pull down menu. For some, the pull down menu might not work. So you might have to change localhost to your computer name instead. To find out what your computer name is, right click on your ‘My Computer’ and go into properties where you’ll find the computer name on the ‘Computer Name’ tab. Once you’ve selected a database, leave all the settings as they are and click on test connection button to verify that the connection to the data is successful and then click on ok, which would add the data connection to ‘AdventureWorksDW’ into your data source wizard.

Click on next to move onto the next step of the wizard where you would be asked for an impersonation account which Analysis Services would use to connect to the data source. Choose the service account and click next and then finish.

Cool! Now that we got a data source for our project we can create a data source view. Because data sources only provide a connection between your project and the data, you can’t exactly see anything beyond the connection string itself. Which is:

Provider=SQLNCLI10.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW

Looking at this, we can’t tell what tables lay within the data source or their attributes (columns) or relations. So we need a data source view which allows me to take a look through the tables and their relations and attributes. To add one, right click data source views in the solution explorer and click ‘new data source view’ to start the relative wizard. Click next on the start screen and select the data source we created earlier. Then click next again and add the tables: DimCustomer, DimGeography, DimTime and FactInternetSales into the list of included objects on the right, click next and then finish and we have our data source view ready which gives would give us something similar to this:

 

Sweet… Using our data source view, we have the tables we included and their attributes and relations. Something that I may have not mentioned in my previous posts is that measures are stored in fact tables and dimensions are stored in dimension tables. So the internet sales fact table contains sales measures which are related to the time and customer dimension tables in order to store sales transactions that took place at some particular point in time against a customer. If you right click on on DimCustomer table and click Explorer Data, you’ll get the table view displaying all the rows and columns within the customer table. Notice that this table only stores line 1 of customer’s address. Where is the other bits and pieces? City? State? Country? Well, they’re all saved in the Geography dimension table and related with using GeographyKey.

Close the table view of the customer table so that you’re back at the data source view. Now, add a cube into your solution explorer by right clicking on cubes and selecting ‘new cube’. On the cube wizard click next and verify that ‘build the cube using a data source’ and ‘Auto build’ are selected. Click next until the cube wizard asks you to identify fact and dimension tables. Make sure that the internet sales table is the fact table and others are marked as dimensions. At the top of the wizard’s screen, indicate the time dimension table and click next. The wizard would then ask you to select time periods. Select the time periods as follows:

  • Year –> Calendar Year
  • Half Year –> Calendar Semester
  • Quarter –> Calendar Quarter
  • Month –> EnglishMonthName
  • Date –> FullDateAlternateKey
     

Click next and then review all the measures available within the internet sales table. Upon clicking next, the wizard would automatically detect all hierarchies for you and then allow you to review dimensions. Keep on clicking next until you’re through with the wizard.

Now that we have finally created the cube, you’ll be presented with a cube structure which would be pretty much the same as the tables we saw when we created the data source view. Next we can finally deploy our cube and browse through the cube’s data. To do this, click on Build > Deploy to start deploying the project. When you get a message saying ‘Deployment Completed Successfully’, click on the cube browser tab which you’ll find just above the cube structure (graphic representation of the tables).

Those who have already used Microsoft Excel’s pivot tables, this should ring some bells. In the cube browser, you’ll have your measures and dimensions on the left, a pivot table on the bottom right and a filter area on the top right side. Go into your measures on the left, then into fact internet sales and drag the sales amount measure onto the area that says ‘Drop Totals or Detail Fields Here’. Then go into cusomters dimension category and drap drop the ‘English Country Region Name’ onto the pivot table area where it says ‘Drop Column Fields Here’. In the row fields area, drag and drop ‘Ship Date.Calendar Year …” from the Ship Date dimension. You might notice two, drag and drop the one with a pyramid shaped icon which represents a hierarchy. What we’ll have now is a pivot table of internet sales by year and region:

Not very hi-tech yet but at least it gives a basic business intelligence. It probably would have taken us quite a lot of time to form the hierarchy between years, half years, quarters, semesters and dates manually. But with creating cubes in BIDS, it’s all easy!

Enjoy and keep experimenting!

Posted by: zeesql | May 23, 2008

Components of SQL Server

When you install SQL Server 2005/2008, you’re asked to select between various server and client components. The core server components are:

  • SQL Server Database Engine
  • SQL Server Analysis Services (SSAS)
  • SQL Server Reporting Services (SSRS)
  • SQL Server Integration Services (SSIS)

 
The SQL Server Database Engine is the OLTP part of SQL Server where rapid data transactions take place. As mentioned in my previous post, OLTP systems record information on the fly. For example, ecommerce websites use OLTP systems to quickly record sales, new customer registrations or new products into the catalog.

OLTP systems contain thousands, millions or even billions of transactions. One wouldn’t want to go through each and every single transaction to conclude business decisions. Instead, aggregating the measures (such as sales amounts) in these transactions by day, month, quarter or year would work optimize the system for the purpose of business intelligence.

When OLTP systems are asked to produce such aggregates, they use up an enormous amount of processing power and the output takes quite a bit of time to show up. They are not designed to produce such results. Furthermore, things could get even trickier if data was supposed to be aggregated using multiple measures stored across multiple OLTP systems.

SQL Server Analysis Services is the OLAP part of SQL Server which provides a unified view of business data stored in an OLTP system or across multiple OLTP systems. These systems are optimized to produce aggregates and proactively cache these aggregates for increased performance. For example when we create data cubes, the OLAP system produces aggregates at several points within the cube’s dimensions in order to support efficient business intelligence services.

Using the cube figure from my previous post, the OLAP system could aggregate the sales measure by locations, product groups and time periods in advance and store them within the cube so that information would be readily available (cached) when we query for it:

SQL Server Reporting Services use data sources such as the Analysis Services to enable creating, managing and delivering interactive web-based reports which could be exported in various formats such as HTML, Excel, PDF, CSV etc or be delivered to recipients through email or on file servers. Here’s what the SSRS Report Manager looks like:

Finally, the SQL Server Integration Services component provides data integration solutions which include extraction, transformation and loading (ETL) operations for data warehousing. The ETL process:

  • Extracts data from data sources such as OLTP systems
  • Transforms the extracted data by performing a set of fuctions such as formatting the date strings, deriving new calculated fields (such as: sales_amount = sales_quantity * unit_price), Translating coded values (some systems might store the code 1 for male and 2 for female) etc.
  • Loads the extracted and cleansed data into a data warehouse

 
Once the data is moved into the data warehouse, it is available for Analysis and Reporting Services and Ad-hoc reporting:

So to round it all up, go ahead and install your SQL Server components listed above, plus the Workstation Components, Books Online and development tools and the AdventureWorks sample database and data warehouse. You’ll find the samples by clicking on the ‘Advanced’ button within the SQL Server 2005 setup program when selecting components.

For those of you using SQL Server 2008 CTP, you might have to download the samples at:

http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=10901

In get onto more practical things in my next post and write about developing your first data cube using the Business Intelligence Development Studio (BIDS).

Adios!

Posted by: zeesql | May 21, 2008

Data Cubes

Here it is to all those who are new to Business Intelligence… an introduction into data cubes. If you’re planning on developing BI solutions using SQL Server, you’re going to run into cubes very often.

Databases that record transactional information, such as sales or product data, store the information in flat two-dimensional tables based on rows and columns. Each time a product is sold, the information related to the sale is entered into the sales table. Similarly, each time a new product is launched, its information is recorded into the product catalog table. Databases recording such transactions are called Online Transaction Processing (OLTP) systems.

Going through large number of rows that these tables contain would probably not be useful when it comes to making business decisions. For instance the management of a chain of duty-free shops would regularly need to analyze sales by location and make decisions about inventory accordingly. Looking at flat two-dimensional sales or product tables would not help.

This is where Online Analytical Processing (OLAP) systems come in. OLAP systems use data marts or data warehouses, which store information from OLTP systems, in order to present business data in a more useful form using measures, dimensions, hierarchies and cubes.

Measures are numeric quantities such as sales amounts or unit sales quantities whereas Dimensions are categories by which the measures could be summed. Hierarchies are an upper level grouping of dimensions. For example, total sales is a measure which can be summed or aggregated by cities, a dimension, which could further be grouped by continents, a hierarchy.

When we take one or more measure and categorize it by dimensions, we form a cube. An example is shown below:

Data Cube

As you notice, our measure in the cube above is Total Sales categorized by dimensions: months, cities and product groups. The continents, product groups as a whole and annual quarters are upper level categories of the dimensions we used or forming hierarchies.

Coming back towards our scenario, if the management of a chain of duty-free shops wanted to analyze sales by location, our data cube would be really useful to find out if their duty-free shop in Paris sold more wines and the one in London sold more luxury goods and help them to make decisions about ordering more stock from those product categories in their respective locations than focusing on other product types. They could study sales per quarter or drill down to month level to notice trends. If needed, Tokyo and Singapore, could be grouped at continent level (Asia) for analysis. Fun ain’t it?

Hope you find this post informative. Any comments or feedback are welcome!

Posted by: zeesql | May 21, 2008

About me

Hi! I’m Zishan Danish (also known as Zee), an IT junkie from Pakistan. I’ve had my first exposure to the world of Business Intelligence in 2005 when I started working for a leading online retailer and came across their massive data warehouse. Although I didn’t contribute to it, but it was pretty exciting to jot down lines in VB script and SQL to create various kinds of decision supporting dashboards and trackers for myself and team members using the data available at hand.

After spending a year working for the online retailer, the next year I was working for a different employer where I built my own reporting engine (dependant on one of their existing reporting systems which seemed outrageously slow to me) providing scheduled extraction of reports and delivery onto the NFS (network file system) or into the mail boxes of users listed in the schedule setting.

With multidimensional criteria settings such as picking report types by months or quarters by clients name by client office locations (main office or branch office or region) it was a blast for people who really needed to sit back in their seats and relax rather than wreck their nerves in anticipation for reports to get extracted off the reporting system that they used to use earlier.

Now that I’ve decided to finally take a deeper plunge into BI, with Microsoft SQL Server being my platform of choice, I’d enjoy using this blog to share my experiences related to Microsoft SQL Server BI solutions with beginners.

Stay tuned and enjoy!

Categories