New Buzzword

New Buzzword! Data Warehouse Automation

A survey by IDC and DM Review gives us these scary statistics:

  • 17 months is the average implementation time for a BI project
  • 5 months needed on average to deploy the first usable BI artifact
  • $1.1m is the mean, annual expenditure on a BI project for companies with more than 1,000 employees
  • 31% of BI projects are declared successful
  • 36% confidence rate that the right data is available to the right people at the right time

Those numbers seem unrealistically horrible, yet as a company that builds and deploys BI environments for hundreds of companies each year, we see IT groups and business users facing them all the time. That is why Data Warehouse Automation (DWA) is such an exciting concept.

DWA uses technology to gain efficiencies and improve effectiveness in Data Warehousing processes. It is more than the automation of ETL development, it automates the entire Data Warehousing lifecycle; from planning, analysis, design, and development, even extending into operations, maintenance, and change management.[i]

The idea behind DWA is to automate each part of the Data Warehouse lifecycle that can be automated, so that the project team can focus on the parts that require more intellectual input than raw technological horsepower. DWA does this by minimizing manual code writing and automating the repetitive, labor intensive, time-consuming tasks normally associated with a Data Warehousing effort.

Proponents of DWA claim that using it will accomplish an initial BI implementation up to five times faster than traditional methods. This results in a higher success rate for BI projects, and most importantly, it allows for faster, better, and more cost effective delivery of BI output to the business.

One of the widely accepted pitfalls of traditional Data Warehousing methodology has been that building a Business Intelligence project is a long, tiring and expensive process. DWA offers a way to avoid those, and in fact, it has been doing so since 2006. The Data Warehousing Institute (TDWI), a think tank devoted to all things data related (and a great resource for education and training), started teaching DWA classes in 2014 and has included the technology as one of the emerging trends for 2015 and beyond.

Data warehouse automation will thrive: Data warehouse architecture and implementation has transitioned from an art to a science. When combined with the increasing demand for analytics and user frustration with delays, automation will drive the development of new and/or additional products to automate the data warehouse life cycle, including design, creation, operations, and change management.[ii]

What this trend implies is that we can change the way we think about building a Data Warehouse. The expected practice of extensive up-front analysis, design and modeling is left behind. The mindset changes from ‘get it right the first time’ to ‘develop fast and frequently.’

Enough of the theory! Let’s go back to the scary statistics and use them to discuss the practical applications of DWA. Their root causes can be broken down into three categories:

Technical issues – building a Data Warehouse and set of OLAP cubes is largely an effort in writing masses of SQL code. Code is what moves data from a source system (like your ERP) to a target system (like a Data Warehouse). Code is used to define metrics, create data structures like product hierarchies, and conduct calculations both simple and complex. Building a Data Warehouse with manual code is time-consuming, labor intensive and error prone, sometimes taking a skilled SQL programmer hours to move a single field from a source database to a target destination.

You can think of any DWA platform as a code generating machine. Instead of writing a block of SQL code, you use a mouse click. The substantial amount of code that comprises an SQL Server Integration Services Transfer Package to move a field from the transactional database (such as Dynamics AX) to a Data Warehouse is generated in the background. The same underlying tools and processes are used as if it was done by hand, but the difference is that it takes about a second to complete the operation instead of an hour of coding. The raw speed advantage is obvious, but since the process is programmatic(i.e., done by a machine the same way every time) human error is eliminated, testing time is greatly reduced and re-working the code to fix errors is not necessary. Time and money are saved, and results are delivered faster.

Data issues – data is messy.  Any large data set is guaranteed to have gaps, errors, and duplications, among other atrocities. These need to be discovered, understood and rectified before the data can get used. This exercise is often time consuming because there is usually a need to combine several data sets in order to get a full view of company operations. For example, combing CRM information (the date a sales opportunity was entered) with the ERP Financial module (the date a first invoice was posted) will give you sales cycle times and allow for financial projections. Now there are two messy data sets to deal with, and in the real world, we usually end up dealing with many more.

A typical multi-database scenario involves trying to get data from multiple source systems to match. One system records a specific type of revenue transaction, but for whatever legitimate business reason, this is not the system of record for the company. Turns out that the system of record has a different chart of accounts structure than the other system. In order for the revenue transactions to flow into the correct revenue accounts, a mapping exercise must take place. If you were using traditional methods, you would need to write and maintain SQL code to handle this mapping. This is a great example of a time-consuming, repetitive task that doesn’t require human creativity and is better accomplished through automation. DWA software would do this with a mouse click – generating the code in the background – in seconds. It is the same for other field transformation tasks, scheduling jobs and generating documentation– all are candidates for automation.

Organizational issues – this is often the hardest. A shared vision and understanding of the data and the numbers must be created in order to deliver a BI product that business people will use. It is vitally important to get everybody on the same page about what data is important and how it is to be used. Everyone must understand and agree on which individual pieces of data will come together to tell the story of the organization and how it is operating. For example, it doesn’t really matter if gross profit is defined as Sales minus COGS, or if it is defined as Sales minus COGS plus SGA. What matters is that everybody comes to an agreement on the method – and that can be hard.

Obviously a piece of software is not going to create a shared vision. But the ease, and speed, at which results can be delivered using a DWA platform can help. DWA allows you to combine disparate pieces of data quickly, take them through a complete BI development cycle, and deliver the result to the business to see if it works. This exercise would be unthinkable in the traditional model because it would take weeks, if not months, to do. The potential time waste and failure would be catastrophic to the project. With DWA, this can be completed in hours, and in essence, it allows you to fail fast. Results can be tested by business users immediately and changed quickly if they are incorrect. IT and the business typically find themselves going through a collaborative process to design, build and deliver BI products. That’s right, IT and the business working together on a shared vision about data and how to make it most useful.

Maybe that is what is most intriguing about the promise of DWA:  it is a technology that solves long-standing problems and has a ripple effect on non-technology issues.  It allows us to get to what is important from a BI project: the ability to respond quickly to changing business conditions. Business users are getting what they want, when they want it, as opposed to what they asked for six months ago. Rapid time-to-value is consistently realized since new BI and analytic functionality can be delivered in days instead of months.

Big Data. In-Memory. Self Service BI. Data Discovery. Cloud BI. Mobile BI – so many exciting things are happening on the BI landscape, and so many new buzzwords are popping up. We think Data Warehouse Automation is one of the most practical we have seen in a long time. If you are looking to avoid becoming a ‘scary statistic’ yourself, I invite you to take a closer look at DWA to see if it is right for you.


[i] TDWI – Dave Wells, April 15, 2014.  https://tdwi.org/Articles/2014/04/15/Data-Warehouse-Automation.aspx?Page=1

[ii] That Was the Year That Was: Major Data Warehousing Events of 2014 (and Predictions for 2015) – December 16, 2014.  https://tdwi.org/articles/2015/12/15/that-was-the-year-that-was.aspx