As any good tradesperson knows, you need the right tool for the job. Just ask someone who’s used a screwdriver to assemble furniture how much they would pay for a power drill.
The same holds true for working with data.
Today, many companies dive into machine learning, advanced analytics, or other buzzword-heavy projects with the goal of getting ahead of competitors. But without a solid understanding of what data can do for their organization, how to effectively store and harness that data, and a strategic, deliberate approach to such endeavors, their efforts may cause more harm than good.
So, what is the right tool for a company looking to draw insights from its data? A data warehouse, which is the most robust and sustainable tool to support reporting, analytics, and other advanced use cases.
Looking to learn more about data engineering? Check out our Guide to Data Engineering with helpful resources on this topic.
A data warehouse, also called an enterprise data warehouse (EDW), is simply a system designed to support data analysis and reporting. An EDW serves as an organization’s central storehouse for integrated data (i.e., data combined from multiple sources so end users may easily source a single, comprehensible, usable view of important company data).
Data warehouses are intended to house an organization’s complete and unified repository of information. From one, you can pull key business data to make informed decisions. Data warehouses do nothing less than modernize corporate data usage through robust and scalable infrastructure.
Data warehouse content may come from a company’s operational systems (ERPs, Historian, PI System, etc.), financial systems, transactional systems, relational databases, and a variety of other sources.
This data usually includes both up-to-date and historical data, all of which internal users may access to build analytical reports. Company decision-makers, analysts, and data experts in particular require swift and efficient access to consistently reliable corporate data—which is precisely what data warehouses are designed to offer.
Let’s delve into a little more technical detail about the function of data warehouses, to illuminate the profound business possibilities they represent.
Data warehouses retain copies of all original or source data. This is crucial because it allows organizations to:
Data warehouses usually operate on an extract, transform, load (ETL) basis and typically employ staging, data integration, and access layers during this process. Key layers include:
where integrated data is put into hierarchical groups (or dimensions), facts, and aggregate facts; and,
where hierarchical groups are placed together.
Once data has been integrated and catalogued, designated business users can mine it to support a wide variety of analysis, research projects, and decision-making and strategic planning.
Part of what makes data warehouses so reliably accurate is that the data they contain cannot be altered. This ensures users can accurately track data changes over time; it also makes creating and maintaining an accurate data dictionary (a complete list of database files) possible. And a correct and up-to-date data dictionary is one of data warehousing’s key means for extracting, analyzing, transforming, and loading data.
This outline of data warehouse architecture leads us to a more complete definition of data warehousing. A robust, strategically focused data warehouse comprises not only tools to extract, transform, and load data, layers to transform it, and a dictionary to organize it; it also includes tools to manage and retrieve metadata—and business intelligence tools that both support and reflect your organization’s unique requirements.
Data warehouse architectures can vary widely in complexity, according to the needs of each organization. All data warehouses, however, must be built using these steps:
So it becomes read-only.
Sort, consolidate, and summarize your data.
This will ensure not only that it’s accurate and complete, but perhaps most important, easy to delve into and use.
This process will be repeated anytime you add more data, or if any of your data sources are modified.
Popular data warehouse architectures
There are three main data warehouse forms; which architectural approach an organization takes reflects variables like size, line(s) of business, and current corporate data setup.
Basic data warehouse. This simple format lets users run simple or straightforward queries like “August sales” or “new customers added in Q2”. How quickly such queries can be completed—also known as access, latency, or online analytical processing (OLAP)—is paramount in this case.
Data warehouse with staging area(s). This is key for data warehouses consolidating large quantities of important but varied business data sources; staging areas make data cleansing easier, and integrating or consolidating data from myriad sources more accurate.
Data warehouse with staging area(s) and data mart(s). This is the future—but a future you can build now. Data marts give different groups in an organization access to the specific information they need, in a way that will benefit both their particular focus (e.g., sales or OpEx) and the larger organization simultaneously.
Thus, the larger and more complex a company is, the more it would benefit from building a data warehouse with both staging areas and data marts. All data warehouses answer data queries, so smaller organizations or those with a single data source would also benefit from adopting a data warehousing approach. But what, precisely, is a data mart?
Costs will vary depending on the implementation, but roughly speaking they can be broken down into data storage, visualization, ETL software, staff and ongoing support.
Here you need to make a decision whether to host on-site or in a data center vs. purchasing clousd storage. Typical on-site data storage costs for midsize companies are approximately $12,000 per year. Contrast this with cloud-based storage of around $1,000 per year and it's easy to see why so many companies are in the cloud.
It's pointless to collect data if you can't do any analysis or reporting with it, so you'll need to have visualization software. There are a wide range of products available with the standard package costing approximately $3,000 per year.
As described above, this is the set of tools required to pull data from various sources into the data warehouse. Expect to pay approximately $20,000 per year for a good package.
If you are looking to manage everything in-house, this is where costs start to add up. A reasonably staffed team would include an IT manager ($140K/year), backend developer ($105K/year), database architect ($110K/year) and a data analyst ($90K/year). Adding it all up results in labor costs of around $450,000 per year.
Not surprisingly, small to mid-size companies will find it more cost effective to partner with companies that can provide a full set of these skills.
Stuff happens, and you'll need to make changes to your system over time. As a rule of thumb, look to budget 50% of the cost of implementation for annual changes.
There are a lot of data sorting, storage, and accessing options available. Which will benefit your business most depends on what you use your data for.
Since data marts are sub-sets of, and reside within, data warehouses; and as data lakes, like databases, don’t sift, organize, cleanse, or integrate data, we’ll consider use cases only for databases and data warehouses now.
Data drawn from databases for analysis is generally used for simple, daily transactions, such as:
Databases are relatively basic tools. The primary function of a database is to accurately and efficiently record data, sometimes very large amounts thereof—and that’s all. To maintain this simple functionality, architectural trade-offs are often required—trade-offs that can limit an organization’s ability to access, use, and analyze its own data.
In contrast, data warehouses are made to support more sophisticated activities, such as:
Consider Wonderware Historian, a database that captures a wide variety of sensor data from multiple industrial sources. A mining company could use a data warehouse to consolidate multiple data sources (including, among many others, Historian); analysts could query this newly populated data warehouse with specific requests, such as a list of each day the mining company’s primary crusher processed less than 3,000 tons.
Not only can data warehouses do more, from a business perspective, than databases alone, they can also be connected to other, more business-focused tools to bring organizations even more competitive value. For example, specific requests such as a mining company’s measurable crusher productivity can be aggregated into other tools like dashboards or more complex models.
We hope it’s clear by now that running a data-driven organization in a competitive and quickly changing business landscape, data warehouses—with data marts as appropriate—are the surest way to win.
Data warehouses offer the most reliable and accurate way for businesses to store and access structured data; this in turn improves cross-organizational data access via reports, dashboards, and analytics tools. These help businesses better monitor performance and improve decision-making because they know their data is trustworthy. This begins a virtuous cycle:
Consider this hypothetical situation (sadly common in the real world): Organization X owns a lot of data but it’s difficult to access. It takes a long time to generate reports and end users can’t necessarily trust results; perhaps quarter-end reports don’t align or some piece of the company’s larger data puzzle is clearly missing. Such reporting insufficiencies only increase friction—data, social, collaborative, workflow, to name a few—throughout the organization.
Things don’t need to be this way. Consider the opposite scenario; Company Z has set up a data warehouse and everyone knows they can access data accurately, easily, and whenever required. What happens at companies like this is, users will read automatically generated reports and, their interest piqued, eventually start asking for more. They will realize how easy it is to get useable company and/or departmental data and will start to run experiments—to improve operations, provide better customer insight, or save money, to name just a few possibilities. This would encourage bolder activity—and ultimately transform the business.
Organizations with committed data warehousing teams can plan and move well ahead of their less data-savvy competitors in every way—from product development, marketing, pricing, production processes, and historical analysis, to forecasting, employee organization, and customer satisfaction. They can, in short, thrive where others will fail.
Speak to Our Experts
Connect with a 3AG Systems expert today and start your journey towards efficient and effective data management.