The 3AG guide to data engineering

In this comprehensive guide, we look at key aspects of data engineering, which together create a solid foundation for all things data.​
About Us

Featured Content

Introduction to data engineering

What is data engineering? And why does it matter?

Perhaps this is best explained with a metaphor.

Imagine it’s 1970, and you’ve just opened a bookstore. For the grand opening, you stock the 10 most popular books from the New York Times bestseller list, but soon realize customers are looking for other books in addition to the most popular ones. You add books from the top 20, then the top 100.

But now customers get confused when they enter the store, because you’ve gone from stacking books in 10 piles to stacking them in 20 and then 100 different piles. Worse yet, customers start mixing the different piles, which makes it even harder for them to find what they want.

So, you add a system of bookshelves, putting the books in order of popularity. Customers entering the store can now walk past the most popular books and make their way to the slowest sellers.

But then certain customers ask you to make special orders. You decide that it makes sense to order a few extra since it’s a sure sign of demand.

See a pattern here? This constant improvement and iteration of the organization of bookstore data has strong parallels to the discipline of data engineering.

Data engineers are charged with helping their companies organize data to be not only accessible to the widest audience, but also to be useful for their particular needs. Some employees just need to look at a chart. Others need to do comprehensive analysis. And other employees need to structure data to allow for advanced use cases like machine learning and artificial intelligence.

Data engineering makes this possible, while also dealing with ever-changing internal and external business landscapes.

Why consider data engineering?

Another metaphor will help.

Imagine your company owns either a shed, a house, a multi-family complex, or a high-rise.

The shed owner probably doesn’t spend much time thinking about engineering permits or maintenance since they don’t need to worry about services like heating, running water, or electricity.

And if the shed won’t be upgraded anytime soon, there's no need to plan for future changes.  The house owner will need to devote significantly more effort to keeping things running; even so, they won’t need to concern themselves with more than their own and their family's needs.

A multi-family complex requires more planning, more resource integration, and a more structured approach to handling issues as they arise; and different tenants may have very different needs.

And a high-rise tower will have a completely different set of engineering requirements, permitting requirements, safety requirements, and so on. The high-rise will also be much more complicated to manage, particularly if things like fixtures have not been standardized throughout the building.

See a trend here? The more complicated the structure, the greater the need for better infrastructure. And earlier planning becomes even more important for bigger buildings, as early mistakes and omissions can become costly later on.  The same holds true for data engineering within a company. While the smallest companies may not immediately need tools and structures to support data access, companies with growth aspirations will need to consider this issue. And the longer they wait, the more expensive the effort will be, with more work required to fix past errors and omissions.

Developing a data strategy

Without a clear data strategy, companies risk carving off small data projects (or no projects at all) that don’t connect, either to one another or to their central databases.

Forward-thinking companies see the value of knowing both where they are, and where they want to go, on their data journey. From a data perspective, this involves understanding where they fit into the analytics maturity model.

Most companies that haven't formally developed a data strategy tend to fall in the descriptive category, performing basic reporting. Those that consciously decide to focus on improving their reporting advance to the diagnostic phase, where they not only centralize data to improve reporting efficiency, but also ask “why” when doing analysis.

More advanced stages of maturity include predictive and prescriptive, but few companies achieve this

More advanced stages of maturity include predictive and prescriptive, but few companies achieve this level of expertise across the entire organization.

More advanced stages of maturity include predictive and prescriptive, but few companies achieve this level of expertise across the entire organization.

Skipping steps can mean missing lower-hanging opportunities for improvement. Such haste can set a company up for failure if begins projects not built on a solid foundation of earlier stage prerequisites.

For example, a company looking to do predictive analysis to support production planning won't be able to efficiently or reliably perform analysis if source data is spread across hundreds of spreadsheets managed by dozens of people. The 3AG Data Coach was developed specifically to help companies assess their current analytics maturity level, their targets, and the steps required to succeed.

3AG Analytics Maturity

Data governance

Data governance is the work of setting your company's policies on how it collects data. It is tightly integrated with data strategy, which includes defing and implementing data governance rules.  

A solid data governance policy will not only identify which sources to consolidate, but will also define who can access data, when data is pulled, and what to do if conflicts occur. Governance policy also addresses other issues, such as retention and integrating new data sources as they come online.

Wrapping up

Data engineering is a complex topic, made more so by the countless ways organizations can tackle it.

For organizations looking to gain more control over their data but lack experience in this area, a guide is invaluable.

Consider bringing in experts who can give an unbiased assessment of your current situation. We would be honored if you consider 3AG's Data Coach as your best option.