Big Data Implementation

So you have a fairly decent idea about what Big Data is but that's pretty much where the story stops. The technological shift is so curved that the most common challenge is figuring out how to go about with the implementation. This post is meant to give you food for thought and get you out of that twilight zone.

The Implementation Steps

There are many paths one can take to achieve Big Data goals. In my case I have divided the implementation into four main steps.

  1. Select Business Objectives and Hypothesis
  2. Explore Data
  3. Conduct Scientific Analysis and Big Insights
  4. Take Meaningful Actions 

Business Objectives

High level business objectives are often very similar across the board; it's taken for granted that they exist in the business plans which is why they don't normally come up during pre-implementation phases. Nevertheless, lest we think objectives are not important, I will talk about them as being the first step.

The real value from growing data comes from knowing what can I do with this data and how can I get value from it. You are not going to know what you need unless you have identified high level objectives.

High Level Objective Examples:
  • 360 view of a customer
  • Increase sales
  • Decrease fraud, waste and abuse
  • Cut unnecessary workforce costs
When selecting objectives think of items that are not specific questions but a gateway to them. From the examples mentioned above the "360 view of a customer" is a better choice than single questions like "how many users are satisfied with your service". The reason is that your research on "360 view of a customer" will also answer the satisfied customer question (among other things).

Data Exploration

If you recall the three Vs of Big Data, you will have to assemble data that conforms to the volume, velocity and variety. Your data warehouse, social media channels, customer interaction system and enterprise system are great sources for actionable insight.

The important thing to keep in mind is that data silos and Big Data cannot co-exist. In order to function as intended, Big Data needs to connect and communicate with all data types. This is one challenge that some companies may face with their database administrators or IT staff, primarily because most DBAs think about data in fixed relational and schema terms. However, when we talk about Big Data, we mean data that can be variable, of different types and structures or be needed in real-time.

Points on starting with the data sources
Do not try to identify and connect with each and every data repository in your organization from day one. Assembling too much too quickly will create a chaotic environment and will bring lots of garbage with it. Work in continuous phases instead.

The reason I suggest working in phases is the veracity of the data. I am sure with undertaking an important initiative like Big Data you would want to ensure the veracity of your results are intact. Here are some suggested steps (by the way, this is also where the Hadoop or MapReduce comes into play - I will write a separate post about that.):

Step 1: Connectivity - Discover data repository and establish connection.

Step 2: Data Analytics, Indexing and Extraction - Identify what objectives that repository fits under and recognize its users. Format your data by creating views and relationships so that it is useable. This will also give you an idea of its value and utilization.

Step 3: Develop Framework - Leverage the data by building a funnel-like framework where the different types of data meet. This will allow you to see patterns, themes and profiles. You will connect streams of data that are already formatted or are real-time directly into this framework.

For example: Your data warehouse should connect directly at this level and does not need to travel through the second step.

Step 4: Applications – Develop applications by using the framework as a source and offer these to your departments and data scientists.

Developing the framework and the application is the most crucial and difficult task. You will either have to develop applications in-house or use tools from third parties like IBM and Oracle.

Important note about the Framework – Your framework should allow you to explore data across multiple platforms like enterprise systems, CRMs, SharePoint, Emails, Files and Documents, Databases and Data Warehouses without replicating any information. Many businesses feel compelled to bring everything to a single location because logically it feels like our analysis will be way easier that way. However, I want to warn you that the main purpose of the framework is to avoid this single location practice. The ideal scenario is to leave the data where it is and consult it at that location whenever you need it.

Here is why the single location implementation is not a good idea for Big Data:

  • By the time your data flows through the regular formatting and other channels, it may be too late. Re-routing real-time data defeats the purpose of actionable intelligence. 
  • Centralizing everything means you are probably replicating your data which is not good since it will waste space and you will overwhelm your team with an overwhelming amount of information.
Example: If I want to gather real-time customer sentiment from social channels (say Twitter), it would be wise for me to have my framework go out and analyze Twitter and bring back the relevant data rather than copy all the data I can find from Twitter in my primary data warehouse.

Scientific Analysis and Big Insights

The applications you develop using the framework will allow you to conduct scientific analysis, create profiles, recognize themes and establish patterns. You can use this information to fulfill your objectives that were defined in the first step.

Scientific analysis and big insights can help the most with things like profiling, patterns, and 360 degree customer view and customer satisfaction.

Profiles and Patterns
Using the patterns and themes found in your data you can create profiles. These profiles can be matched against your new or potential customers to calculate their risk score, purchase potentials and needs. This is also known as predictive analysis.

Healthcare can benefits from predictive analysis by identifying the probability of an illness and kick-start preventative treatments. Fraud, waste and abuse can also be controlled by flagging the abusers of the service and taking appropriate actions.

Health Insurance Exchanges will benefit the most from predictive analysis by utilizing real-time risk scoring. This is huge because Obamacare does not allow insurance plans to ask for any pre-existing conditions anymore and Big Data can provide help to fill that information gap.

360 customer view dashboard
The velocity of the information available matters the most in customer interaction scenarios. Whether it be through email, website, in-person or by phone, the speed of data will enable you to market at the right time, solve issues effectively and recognize any concerns immediately.

If you observe the average amount of time logged by the customer service per call, you will notice that most of it is spent in either finding the information, having to look at multiple applications or rerouting to someone who has the missing piece. This can be avoided using the 360 degree customer view dashboard.

The customer service representatives are not data scientists which is why the dashboard should be a self-service pre-computed system and should display all the relevant information to them. They should be able to see the most current information without having to open multiple applications including recent email communication with the user, call logs, member status and subscription information.

The customer service can also bring sales and marketing to a whole new level. During the call the dashboard can empower them to see if there is a service that fits really well with something that customer already owns and whether there is high probability of the caller being receptive to the offer. Hence suggesting the right products at the right time.

Customer satisfaction trending
Google's CEO Eric Schmidt once confessed that if Google wanted to, it could have gotten into predicting the stock market just for fun. Google decided not to; nevertheless Eric's statement shows the power of mining trends in data.

Your Big Data application should mine Twitter, Facebook, call logs and email communications for real-time decision making patterns. Think about it! The real-time movements in these areas can easily tell us:

  • If a certain type of complaint is picking up momentum (lots of people calling about the same thing or talking about the same thing)
  • If there is a danger of an outbreak (healthcare can do this by monitoring their claims, calls and emails)
  • Whether a certain product is doing good in the market or not (it can tell us what buyers are thinking - it's expensive, good, bad or just right)
The trending and real-time analytics works similarly as text analytics where you take apart a string, remove noise data and look for words that appear again and again. If analyzed properly you should be able to tell where you need to divert your attention (e.g. high percentage of phone calls coming in due to a broken fan in your product).


The last and most crucial piece is to have meaningful actions to improve the outcome of an event. The kind of actions a business will take should be planned early on otherwise all you have is this Big Data staring at you with big results and nothing to act on.

Here are some actions with respect to the ideas presented throughout this post.

  • Customer service training should be modified to include a customer dashboard which will reduce the training time and the costs associated with it. 
  • Eliminate silos in the company by using Big Data as a means to have them collaborate and work as a team.
  • Increase revenue by improving marketing efforts through up-selling, cross-selling and suggesting services at the right time.
  • Reduce fraud, waste and abuse by flagging suspicious behaviors and by matching profiles of potential customers with the known abusers.
  • Increase customer satisfaction by taking appropriate steps depending on the trends found in the real-time data streams.
  • Brace for any outbreaks if a pattern is observed that is growing and contains a common disease variable.

Bottom Line

At the end of the day we will eventually arrive at a point where the source of data will not matter anymore. What will hold importance is the knowledge and ability to analyze the data effectively. For this reason you will have to pick and choose your battles on whether to use third party infrastructures, repository cloud or manage it all in-house. Same goes for data processing capabilities, if the size of your data is too large you can use platforms like Map Reduce and Hadoop.

Another challenge to Big Data is the notion believed by some that it is something evil. While there are pros and cons of everything, it is vital for us to understand Big Data is not evil. Many who bad mouth this technology don't realize how they may already be benefiting from it. Ask anyone who has ever been unfairly denied for a credit card or loan application should understand that had Big Data been in use, the outcome would have been different.

Thank you for reading and I hope I was able to give you some level of idea on Big Data implementation.


What Big Data Is and What It Isn't

Want to know what is Big Data? Consider this: since the beginning of time until 1993 the amount of digitized data humans created was about 5 Exabytes (5 billion Gigabytes), in 2010 we created 5 Exabytes in just two days, and in 2013 we are creating that same size in 10 minutes. Just thinking about what we can do with this information gives me goose bumps.


Threat of Hackers in Healthcare

When Captain Zap hacked AT&T, millions of Americans saved money on their long distance phone call bills. Captain Zap successfully penetrated the AT&T network and went on to be part of the hacking hall of fame. His case is famous because AT&T didn't even know about the hack until after the next set of bills went out. By then the damage was done and Cap Zap was nowhere to be found. He was eventually caught but it took 18 grueling months of FBI hard work.