Data Analysis using SQL – Part 2 – Database design

No Comments
Data Analysis using SQL – Part 2 – Database design

Defining Data Warehouse

A data warehouse would be a central repository of the data of the entire enterprise.

A data warehouse is a collection of data. It exhibits the following properties:

  • Subject-oriented: A data warehouse should contain information about a few well-defined subjects rather than containing information about the entire enterprise.
  • Integrated: A data warehouse is an integrated repository of data. It contains information from various systems within an organisation.
  • Non-volatile: Data values cannot be changed without a valid reason.
  • Time-variant: A data warehouse contains historical data for analysis.

Structure of Data Warehouse

One of the primary methods of designing a data warehouse is called dimensional modelling.

The two key elements of dimensional modelling are facts and dimensions, which are basically different types of variables used to design a warehouse. They are arranged together in a specific way known as a schema diagram.

OLAP vs. OLTP

What is OLAP?

Online Analytical Processing, a category of software tools which provide analysis of data for business decisions. OLAP systems allow users to analyze database information from multiple database systems at one time.

The primary objective is data analysis and not data processing.

What is OLTP?

Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization.

The primary objective is data processing and not data analysis

Example of OLAP

Any Datawarehouse system is an OLAP system. Uses of OLAP are as follows

  • A company might compare their mobile phone sales in September with sales in October, then compare those results with another location which may be stored in a sperate database.
  • Amazon analyzes purchases by its customers to come up with a personalized homepage with products which likely interest to their customer.

Example of OLTP system

An example of OLTP system is ATM center. Assume that a couple has a joint account with a bank. One day both simultaneously reach different ATM centers at precisely the same time and want to withdraw total amount present in their bank account.

However, the person that completes authentication process first will be able to get money. In this case, OLTP system makes sure that withdrawn amount will be never more than the amount present in the bank. The key to note here is that OLTP systems are optimized for transactional superiority instead data analysis.

Other examples of OLTP system are:

  • Online banking
  • Online airline ticket booking
  • Sending a text message
  • Order entry
  • Add a book to shopping cart

Star Schema

Facts and dimensions are the two key elements of dimension modelling. A typical problem might involve multiple databases with many different variables and we may not be interested in all the variables. Hence, only some facts and dimensions are combined in a specific manner to create the structure of data warehouse. This structure is called as a schema diagram


A schema is an outline of the entire data warehouse. It shows how different data sets are connected and the different attributes of each data being used for the data warehouse.

Summary

You learnt about what a data warehouse is and the difference between a data warehouse and a transactional database. You learnt that OLAP systems are Subject-oriented, Integrated, Non-volatile and Time variant. 

You learnt that the data warehouse gives an integrated view of the entire organisation and the data is organised for efficiently carrying out analysis. You also learnt the difference between a data warehouse and a transactional database.

You also learnt about facts and dimensions. You also learnt how to arrange facts and dimensions to design a data warehouse. You saw how dimension tables act as the metadata that is the data about data and they enhance the facts table to enhance insights about the data.