You’ve heard the hype about Big Data and are now trying hard to separate hype from reality. Since a lot of confusion remains about what Big Data really means, I think it’s important to start by defining the term, which is itself misleading as it implies that the only challenge with Big Data is size. (Size is one of them, but there are other challenges.)
The term “Big Data” refers to any information that can’t be tapped into using traditional data warehousing processes and tools built for handling structured, batch-processed data. We can break down the definition into four V’s: volume, variety and velocity, plus my fourth, veracity. I add the last V in light of the increasing demand for new and accurate insights.
1. Volume: 90% of the world’s data has been generated in the past two years alone. Organizations today are overwhelmed with volumes of data, some of which needs to be organized, secured and analyzed. But as the amount of data available to the organization rises, the percent of data that can be processed, evaluated and understood is declining. Planning for this data explosion is fundamental to Big Data management.
2. Variety: 80% of the world’s data is semi-unstructured. Sensors, smart devices and social media are generating this data through Web pages, weblog files, social-media forums, e-mails, documents, sensor systems and so on. To capitalize on Big Data, organizations must be able to analyze and store these types of data as well. This kind of data being generated doesn’t lend itself to traditional technologies that enable only structured storage and retrieval. Variety in data represents a fundamental shift in the way analysis needs to be done to support today’s decision-making and insight process.
3. Velocity: The constant flow of data and the speed at which it moves is impossible for traditional databases to handle. Gaining a competitive edge means identifying a trend or opportunity in minutes or even seconds before your competitor does. In addition, this data has a very short shelf-life; compelling organizations to analyze this data in near real-time, while the data is in motion, not after it is at rest. (It’s like looking through the rearview mirror, while traveling well above today’s speed limit).
4. Veracity: The truthfulness and quality of data is the most important frontier to fuel new insights and ideas. Focus on quality has historically been higher for traditional data warehouses because they serve as systems of record for bookings and revenue data. However, the same focus needs to apply to Big Data in order to increase the signal-to-noise ratio and our ability to make the right decisions. We will see Big Data getting characterized as much by its veracity as by the other three V’s.
So why should you care about Big Data? Well, you may want to jump on the Big Data train before you get run over, especially if you lie awake at night trying to figure out how to gain the following insights:
- How do we more deeply understand customer behavior, sentiments, satisfaction and preferences, which could allow us to tailor products and services precisely to meet those needs and improve engagement?
- How do we drive more innovation and identify new growth opportunities by using data obtained from a variety of sources to improve the development of the next generation of products?
- How can we become more agile with an ability to answer questions that were previously considered beyond reach?
- How do we increase operating margins, drive efficiency improvements and productivity gains?
- How can we generate more precise forecasts and adjust business levers accordingly?
What are some of the considerations when investing in a Big Data solution? What technologies should I be looking at? How do I ensure that my Big Data solution is characterized more by the ultimate V—the value it is providing my organization? More on that in my next post.