With so much hype about Big Data, it's hard for leaders to know how to exploit its potential, making it important for them to cut through all the hype and confusion, and discipline themselves to base their actions on known facts and business-driven outcomes. This is easier said than done due to the various myths surrounding Big Data. Here are three myths that need to be debunked in order to avoid resource wastage, dead-end paths or missed opportunities.
Myth No. 1: Big Data (and not 42) is the answer to life, the universe and everything.
Analyzing data across a couple of terabytes in a structured way is not really a Big Data problem. Don’t treat every analytics need as a Big Data problem. Most people in your company don’t need Big Data. They need small data. But they need it in a way that is easy to use and gives them the information they need in terms they can understand.
You will begin to encounter a Big Data problem only when either:
- Your existing infrastructure cannot cost-effectively cope with the growth of the three data v's - volume, variety and velocity.
- Your business needs a broader range of data to achieve its objectives, and one of these new information assets complicates one of the existing v's you are already managing.
Having a hard look at the business problem you are trying to solve and mapping it to the data investments needed to solve that problem would enable you to be more prudent about how far down the Big Data path would you want to go.
Myth No. 2: With Big Data, we can go easy on data quality.
Leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. Additionally, Big Data is not mission-critical, so why invest too much in data quality? In reality, there are more flaws than before because there is more data, and therefore the overall impact of poor-quality data on the whole dataset remains the same. The old concept of garbage in, garbage out (GIGO) still reigns. In addition, much of the data that organizations use in a Big Data context comes from outside or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of Big Data. Hence, it is important to apply the principle of data quality policies and procedures concerning how Big Data is acquired, maintained and disseminated to ensure the likelihood of deeper, more accurate insights.
Myth No. 3: With data lakes and Hadoop, the data warehouse is pointless now.
Due to the hype around Hadoop and data lakes, many leaders are under the notion that building a data warehouse is a pointless exercise, as advanced analytics use new types of data beyond the data warehouse. Moreover, vendors have started marketing data lakes as enterprise-wide data management platforms for analyzing disparate sources of data in their native formats.
In reality, a data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies and they should not be positioned as replacements for data warehouses or as critical elements of customers' analytical infrastructure. Similarly, Hadoop can be best leveraged to extend the data warehouse to store and process newer, more complex data sources, not really to supplant the existing data warehouse investments. Data warehouses already have the capabilities to support a broad variety of users throughout an organization. Leaders must refine new data types that are part of Big Data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it and the level of data quality necessary.
Data is established now as the new fuel that will drive organizations, and there is a whole lot of discipline and a new role needed to maximize the value of data by not succumbing to the hype and myths surrounding Big Data. In my next post, I will share a few thoughts on why this new role, that of a chief data officer, is necessary to bust some of the myths mentioned above and shepherd organizations toward data nirvana.