As a child I had a dread-fascination with horror films. They would terrify me to the core. Freddie Kruger, Jason, Pin-Head and Chucky all lurked in the dark recesses of my room. They were under the bed, in the wardrobe, just out of sight, but never out of mind. Collectively I lost a lot of sleep thanks to those guys!
Now as a fully grown adult, you’d think that I would be past ‘things that go bump in the night’! Whilst the old crew are still capable of giving me the willies, what keeps me up at night now is the Swamp Thing, well the Data Swamp Thing to be more precise.
No one contests the exponential growth of data, the stats are mind boggling. More data has been created in the past two years than in the entire previous history of the human race. And by the year 2020, about 1.7 megabytes of new information will be created every second, for every human being on the planet.
Big Data is not a fad - it is a simple reality of the digital age. At the moment less than 0.5% of all data is ever analysed or used. Just imagine the potential here. This is something that has not slipped the radar of most forward looking businesses, and 73% of organizations have already invested or plan to invest in big data by 2016.
For a lot of organisations Data Lakes are either the next logical step in their information management journey or an easier and more flexible alternative than building a data warehouse. These are typically being deployed using one of the many flavors of Hadoop or Big Data platforms and often get deployed as an ungoverned data store. The advantages of data lakes are many, including the ability to load most types of data and to support a huge range of analytical needs.
The ability to avoid the cost and effort of defining data structures up front is a significant advantage. You can achieve huge time and cost savings by pointing this task at where the data will actually be used, and at the people who know it best — the business people doing the analysis.
However, this is where you must beware the Swamp Thing. Failure to understand the need for governance, quality, context and security/access can quickly pollute your data lake, muddying the waters and turning it into a data swamp.
Unfortunately, data lakes are not a magical solution. Organisations must pay attention to data quality. The simple truth is that if you don’t want to be waking up in the middle of the night in fear of the Swamp Thing, then you need to focus on more than just the technology.
To create a successful data strategy, you must start with stewardship, ensuring that you address the organisational structures, processes and cultures that support basic information management principles and allow you to keep your lake clean. By just promoting a minimum practice of information classification upfront, to capture basics like the source, owner and intended purpose, you can ensure data loaded into your lake can be reused by others in the future and support maintenance functions to help keep the waters clean.
If you get this right, you can start to build effective data-driven strategies that will deliver both differentiation and competitive advantage for your organisation.
And just as importantly, you’ll avoid losing any more sleep to the Swamp Thing!