Is Your Big Data Crying Wolf?

Trust, it means everything.

According to NZ Statistics, New Zealand is currently the 9^th and Australia is the 12^th most trusting country in the world. That is pretty darn good right?

So now let’s try a quick experiment: Did you instantly question the validity of the data that this fact was based on? Interesting isn’t it, whilst this survey shows we trust each other, we have an inherent mistrust of data – especially when it goes against our gut feel.

It’s not surprising to distrust the reports and analytics that businesses generate when you consider the complexity and volumes of the data that we are talking about. Even relatively modest sized businesses are struggling to house, understand, and leverage the data they are working with.

I am sure you have been there. Those times when you’re in a meeting reviewing new reports and you have the nagging feeling, that little voice in your head, telling you that something’s not quite right. That’s the ghost in the system, the bad data talking to you. And if you can feel it, it’s a good bet that everyone around you feels it too.

What do I mean by bad data? I mean wrong or outdated data, missing or duplicate data, non-conforming data (or worse: data that’s been shoehorned in). Despite decades of IT building and supporting systems and Data Warehouses, bad data continues to creep in.

We all know the parable of the Boy Who Cried Wolf. You can be forgiven for providing incorrect information once, but continue to do it and people will just stop listening.

Lack of trust will create a slew of upstream and downstream problems. When trust is lost, decisions become purely instinctive, or a cottage industry of shadow IT springs up. While this is an attempt to improve the problem, it exacerbates it, with data starting to exist in different states, within different parts of the business.

Poor data-driven decision making can have direct financial ramifications. A Harvard Business Review report conducted that bad data cost the US economy $3 trillion per year. That is a staggering number. Per the report:

50% — the amount of time that knowledge workers waste in hidden data factories, hunting for data, finding and correcting errors, and searching for confirmatory sources for data they don’t trust.
60% — the estimated fraction of time that data scientists spend cleaning and organising data, according to CrowdFlower.
75% — an estimate of the fraction of total cost associated with hidden data factories in simple operations, based on two simple tools, the so-called Friday Afternoon Measurement and the “rule-of ten.”

The reason bad data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive.

So how do you reduce time and cost, silence that ghost in the system, restore confidence, and get back to effective data-driven decision making?

The challenge is that everything must be built on a solid foundation. However, with the ever-increasing volume and variation of data, and the concomitant ways of looking at it, the IT data services department can never keep up. The emergence of Data Lakes help the propeller heads with their analytics wizardry, but even they can be disrupted by bad data.

What is required is something that can be the foundation layer of your data strategy which is agile enough (with both a big and a small ‘a’) to cope with the existing and emerging needs of a modern, data-savvy organization. Focusing on the foundational data infrastructure may not be the most glamorous solution, but it will be the most powerful.

Back in 1990, a data modeller called Dan Linstedt created a new methodology “Data Vault” that has evolved to address many of the issues that arise from poorly housed data. In his own words:

“What Data Vault 2.0 offers organisations is an Infinitely scalable architecture that, is not only quicker to implement, but, ensures that the data is trustworthy and correct,”

Data Vault has been increasingly adopted as a best practice standard for Data Warehousing (Inmon et al) and the evolution of Data Vault 2.0 makes it fully compatible with Big Data deployments.

“Telstra, Audi, Lockheed, Vodafone and Tesco have already implemented automated data warehousing using Data Vaults, allowing them to answer business users’ questions in hours and days instead of weeks and months, while lowering risk and reducing total cost of ownership,” Linstedt.

As well as seeing the adoption of the Data Vault 2.0 methodology for new data warehouse deployments, it is also being used to replace aging Data Warehouses that have become bogged down, overly complex and unmaintainable. In addition, the unique traceability that DV 2.0 provides makes it especially suited to merging or rationalising multiple data warehouses from disparate business units or organisations due to restructure, acquisition or merger.

When combined with data governance and stewardship processes, Data Vault 2.0 can effectively underpin your data strategy. Regardless of the process that you adopt, it is essential to understand that this problem needs to be addressed from the foundations upwards.

Continuing to build solutions, layering tools and making decisions on bad data is just building a house of cards that will eventually come tumbling down.

Working directly with Dan Linstedt, Certus Solutions has now been named the exclusive training partner for Data Vault 2.0 in Australia and New Zealand. In addition, Certus can provide expertise in coaching and services to rapidly deliver your new Data Vault warehouse. If you are a Data Professional and interested in training or just how the DV 2.0 methodology could benefit your business, then find out more here:

Is Your Big Data Crying Wolf?

Related articles

Blog

Blog

Blog

Stay up to date