
09 Apr enterprise data warehousing architectures on premise vs cloud
Data Warehouse – Revealing the Concept
Few key benefits of EDW include:
- Improved data quality in the form of consistent codes and descriptions, flagging and fixing bad data
Provides a single unique data model regardless of the source of data - Restructuring the data to make it useful for business users. This type of data delivers excellent query performance for complex analytic queries
- Adds value to operational business applications like customer relationship management (CRM) systems
Three-Tier Data Warehouse Architecture
- Bottom Tier – this contains the database server which is used to extract data from various sources, such as from transactional databases that are used for front end applications.
- Middle Tier – this station is an OLAP server that transforms data into a structure that is perfectly suitable for complex querying and analysis. OLAP server normally works in 2 ways: a) either as an extended relational database management system which maps operations on multidimensional data to standard relational operations, or b) by making use of a multidimensional OLAP model which directly executes multidimensional data and operations.
- Top Tier – this is the client layer that holds the tools used for high-level data analysis, querying and reporting and data mining.

Data Warehousing Models
We will discuss 3 common data warehouse models, virtual warehouse, data mart, and enterprise data warehouse.
- The virtual database allows users to efficaciously access all data from the data warehouse. Here data is actually stored in a set of separate databases and can be queried together.
- The Datamart model is mainly useful for business-line specific reporting and further analysis. Here the data is aggregated from a wide range of source systems relevant to a particular business area.
- Enterprise data warehouse model contains aggregated data from all business units and it spans across the entire organization.
Loading Data into a Warehouse
Data contained in the warehouse is uploaded from many different operational systems, for example, marketing/sales. This stored data passes through an operational data store and often requires data cleansing to get it ready for additional operations. This is done to ensure data quality before data is used for reporting in the data warehouse. Main 2 approaches used towards building a data warehouse system are:
- ETL – Extract, Transform, Load
- E-LT – Extract, Load, Transform


Structuring a Traditional Data Warehouse
Star schema boasts of a centralized data repository that is stationed in a fact table. The fact table is being split into a series of denormalized dimension tables by this schema. The fact table consists of aggregated data that are used for reporting purposes. The dimension table actually depicts the stored data. Denormalized designs are simple mainly because data is grouped. The fact table makes use of only one link to join to each dimension table. And this simple design makes it easier to write complex queries.
Snowflake schema normalizes data by efficiently organizing data. In this way, every data dependencies are well defined and each table consists of minimal redundancies. Single dimension tables branch out into disparate dimension tables. Snowflake schema actually uses very less disk space and maintains data integrity. But queries are quite complex and this makes it a little difficult to access required data since there are different joins.
Cloud Data Warehouse Architecture
Amazon Redshift

- Client applications: Amazon Redshift is based on industry-standard PostgreSQL. It integrates with several ETL and data loading tools and many other Business Intelligence Reporting, data mining and analytics tools.
- Connections: Amazon Redshift interacts constantly with client applications via industry-standard JDBC and ODBC drivers for PostgreSQL.
- Clusters: These are the core infrastructure components of an Amazon Redshift data warehouse and are composed of compute nodes.
- Leader node: This handles communications with client programs and computes nodes, also it develops execution plans to conduct database operations. Also, it compiles code for individual elements and assigns them to compute nodes.
- Compute nodes: This executes compiled nodes and sends results to the leader node for aggregation.
- Node Slices: Compute node is divided into slices, each slice has a portion of the node’s memory and disk space where it processes the workload assigned to that node.
- Internal Network: Amazon redshift can provide high-speed network communication between leader node and compute nodes by utilizing high bandwidth connections, close proximity and custom communication protocols.
- Databases: Cluster contains databases. SQL client interacts with the leader node and coordinates query execution with compute nodes.
Azure Synapse
Main highlights of Azure Synapse:
- Infinite Scale: With this kind of boundless scaling capabilities you can deliver useful insights from all your data across numerous data warehouses and big data analytics systems with fathomless speed.
- Unsurpassed Security: With the right azure consulting services, you can secure your precious data with most sophisticated security and privacy features in the market, for instance, column and row-level security and dynamic data masking.
- Substantial Insights: Augment the process of locating insights from your entire data and apply machine learning models to your entire intelligent apps.
- Integrated experience: Considerably reduce project development time with a unified experience in order to develop end-to-end analytics packages.