What Big Data Is and What It Isn't

Want to know what is Big Data? Consider this: since the beginning of time until 1993 the amount of digitized data humans created was about 5 Exabytes (5 billion Gigabytes), in 2010 we created 5 Exabytes in just two days, and in 2013 we are creating that same size in 10 minutes. Just thinking about what we can do with this information gives me goose bumps.


The difference between the Big Data and the data we currently use is their purpose and their structures. Traditionally we have been programmed to use structured data in our databases, data warehouses and data marts. While our reporting and analysis gives us a picture perfect snapshot of what has happened the previous month and what is happening right now, it does nothing about the future; this is where Big Data shines.

Let's look at some real world examples:

Market sales data can provide you a report on how your product fared among your customers but does nothing more than that. But a real time utilization of customer sentiment could've given you a chance to improve or make decisions while your product was actively being marketed and being sold. Big Data can help you improve your sales figures and ensure minimal losses.

An insurance company can look at the medical claims and learn that such and such patient suffered a heart attack; or the same insurance company could utilize medical and behavioral data and trigger an alert that a patient has an increasing risk of heart attack and notify the doctor or it's UM to proceed accordingly. The same could be done by the hospitals and the physician offices. In healthcare Big Data can help with better diagnoses and detection of medical conditions before they surface in terminal forms.

Law enforcement agencies use their data to investigate and identify criminals after the crime has been committed. Big Data offers a potential to predict a location where a crime is likely to take place or flag an individual whose behavior foretells the coming crime.

So what is Big Data and why it is important?

Plainly put, “Big Data is magic” as it is about improving the outcome of an event by learning about it before it even happens. In other words if used correctly, Big Data can predict a future event and give us time to make changes to improve that event.

Big Data is important because the variables that today's businesses depend on are much more different than what they used to be 10 years ago. As we move towards the coming age of singularity, a simple report or reporting dashboard is not enough for CEOs to make informative decisions.

Take a look at what most of us currently do:

  1. The management requests a report from IS and provides requirements
  2. IS gathers the specifications and generates a report
  3. The generated report is sent to the management
  4. The cycle continues as the management requests additional reports
The traditional report cycle approach is primarily driven by the questions we already know, requirements we already have and the data which is very specific and is structured really well. Comparatively under Big Data's world you will need objectives instead of questions, your data sources will be multiple and your decisions will be real-time. It is for this reason that if your IT department is not familiar with Big Data, you will confuse them with your request as it may not carry the traditional specifications they are used to.

Easy or not, Big Data is en route to becoming a mandatory part of our technological use. Google trends show the ever growing popularity of Big Data. The large ratio of visits are from countries like India, China, Israel and Russia, which are developing the majority of softwares and applications.

The ingredients

There are three main data sets that combine to make Big Data and all three must be used to invoke its true power. The three data sets are the Velocity, Volume and Variety.

The Velocity is the speed at which that data is available. This means having real-time access to data that is generated in real-time and knowing what is and what isn't important. The Velocity helps with quick analysis of the usable information to conduct predictive analysis. This is also where Hadoop comes in (more on that later).

The Volume is the massive scale of unstructured data that is thrown at an organization and can be stored to perform analytics.

Variety is the different types of data that is coming in from different sources. The challenge with the variety is to be able to use both the structured and un-structured data types.

Here is why each one of these are important:

  • Without Velocity the real-time-ness will be gone and the volume and the variety will only provide old data, which will eliminate the ability to take appropriate actions as the events unfold in real-time.
  • Without Variety the resultant data will be from stable and static sources like Data Warehouse which means it will be everything that Big Data isn't. If all you need is a couple of reports written against your Data Warehouse then you don't need Big Data. When Variety is introduced to the mix, it will generate the big picture and induce big insights.
  • Without Volume the chances of errors in calculations will increase by a large margin. The smaller the data is the more difficult it will be to eliminate outliers and observe patterns.

Traditional ApproachBig Data Approach
Questions matterObjectives matter
Business users determine what questions to ask and IT delivers data to answer that questionBusiness users determine high-level objectives. IT delivers a platform to enable discovery and explore what questions could be asked based on the objectives.
Look at specific information sourceLook at all available information sources
Repeated operations and approach (reports)Iterative, exploratory analysis
Highly structured dataHighly variable data
Stable sources and stable volumeVolatile sources and large volume
Well defined requirements Fuzzy questions with changing requirements
I have to know the question before going to my ITMy question will come out of my iterative approach. Once I have found my questions, I can use it in my traditional approach.
Traditional approach does not work with Big DataBig Data works together with the traditional approach in the sense that it starts as an exploration journey and then the traditional approach resumes.

Big Data is not something new

The concept of using data to understand how certain actions will affect the future is nothing new. In fact there are several recent examples of such events.

President and CEO of Yahoo Marissa Mayer banned Yahoo employees from working from home. Her decision was not a result of a complaint from some disgruntled employee or a manager but good old data. Marissa analyzed the VPN logs and found that not enough employees were logging in and hence skipping work altogether. Once she concluded her results the decision was made.

In 1998 Rob McEwen, the owner of GoldCorp, had purchased his very first gold mine in Canada and wanted to know how big of a processing plant should be built. Having tried everyone locally he made his mine's geological data available online and placed a prize money for anyone who could identify high density gold areas. There were three winners, two from New Zealand and one from Russia; but the kicker is that these teams never even visited his gold mine physically. They calculated the right numbers solely on the data made available to them.

Bottom line

All healthcare CIOs and CTOs should be looking at this as their primary initiative. The question of Big Data becoming mainstream is not if but when?

Big Data presents huge opportunities for new entrepreneurs and we will see many new billionaires coming out of this. We are going through a technological shift that is very similar to when the businesses made the leap from hierarchical databases to the relational ones. It is only a matter of time when someone will get to enjoy that aha moment in utilizing Big Data for healthcare.

I hope my post has helped you understand Big Data. In my next post I will write about how to implement Big Data.


  1. You have certainly explained that warehouse management is the process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions..The big data analytics is the major part to be understood regarding Hadoop training velachery program. Via your quality content i get to know about that in deep.Thanks for sharing this here.