Big Data Analytics Strategy – State Of The Union

Technology options in the big data space often creates confusion and concerns.

Confusion is because, there are too many options and it’s not one or two monolithic platform which provide solution to every possible problem.

Concern is because, there is a heavy influence of open-source software, platforms and equally 3rd-party product vendors, who don’t have a long record in the space.

In essence, this is both a ‘problem of too many’ and ‘problem of very few’ !

Lets understand these dynamics in detail.

Problem of too many
As you explore big data technology options, you’d soon find that there is always more than one way to do things and there are too many technology options to do so.

Example : Data integration from one or more data sources.
If you have an existing data integration tool/platform such as Informatica or AbInitio, you could look to see if you can talk to and process data from your data sources.

If it can, then question is can it handle all types of data sources in play.

If the answer to that is a no, then you’d have to understand the data sources’ characteristics ( data type, frequency of arrival, quality, treatment needed etc. ) and accordingly pick a tool ( Kafka or Sqoop or Flume or Spark Streaming or Custom built )

In this example, we haven’t even yet talked about the performance considerations, but trying to make a first cut choice.

The number and choices on big data platforms is to some extent comparable to number and choices on database technologies.

Example : In the legacy world, one would look for database options such as Oracle or Microsoft SQL-Server or IBM DB2 or, MySQL.

Similarly, given the prominence and market adoption, you would come across names such as Cloudera, Hortonworks, IBM Big Insights, MapR etc.

Each of these big data companies’ has their own product vision and roadmap, which they project as choices to their customers, while they work very closely with the underly open sources communities. I will cover big data platforms separately.

Problem of very few

Platform pick and choices are taken on a set of criteria’s such as

  • History of the company
  • Product roadmap and development over years
  • Market adoption and general opinions
  • Periodic reviews and guidance from market, industry analysts
  • Financial stability and viability
  • Sales, Support and Services options
  • Licensing models and Price options
  • You would find that there are few big data companies, which has a long history presence and public !

You’d find most of them to be a private company, in matured start-up mode, backed by prominent and stable venture capitalists.

However, you’d be amazed to see the market adoption of their products to be wide spread, across industry segments and across the world !

This state, puzzle’s decision makes on a number of levels.

What reference points should be considered to pick a technology vendor or two ?

Should we use the same measure as those with which we picked our legacy tool or technology or should we frame new methods for the new world ?

Should we plunk down lot of cash on these newer products or should we wait and watch for those products and in turn supporting companies to mature or should we make investments today ?

How does such decisions play out on licensing and product rollout related cost structures and are there any other related hidden cost ( tangible and otherwise ) ?

All of the points expressed above should not scare one to stay away, but to actually realize that the ecosystem of Big Data should be seen with a new type of lens !

Big Data Analytics Strategy – Current State Assessment

Further to the preamble on big data analytics strategy, when you start, it is necessary to do an inventory check with few questions.

  1. Do we have in-house experts, who can work in identifying appropriate use-cases for big data analytics ?
  2. Do we have any groups, which has ‘experimented’ or have gone to deploy big data analytics programs ?
  3. Do we have any groups, which has run pilots, working with any big data vendors ?
  4. Do we have expert groups, or part of Data Stewards organization which has experience in coming up with or, vetting use-cases for big data analytics ?
  5. Do we have any restrictions or guidance in working with any product or services vendors in big data programs ?

This should help you to decide the next set of actions. Those actions could include

  1. Create a focus group ( a small team of 3-5 members ) which would work in formulating the big data analytics plan – a roadmap of sort, but not long and large enough to slow down
  2. If knowledge on big data technologies – even at a very high level – is lacking, it’d be good to get guidance from market analysts ( Gartner, IDC, Forrester ) or, go through product briefings from Big Data Analytics product vendors or, take up basic/preliminary courses on Big Data Analytics technologies
  3. Collect possible information in formulating use-cases for big data analytics
  4. Also, create a focus group in creating, prioritizing and finalizing use-cases
  5. Create working models bring in representation from business and IT into these groups and joint exercises
  6. Probe existing product ( Database, Data Integration, Business Intelligence, Data Visualization, Data Security and Hardware ) vendors, part of partnership on their big data offerings, product maturity, adoption and deployment experiences
  7. Do an inventory check on SI vendors, who work in your group and organization on their big data stories and experiences
  8. Part of focus groups’ research could also include competitive analysis of your peers or in the same industry space, to find their big data story, success and failure scenarios, vendor partnerships and use-cases and role of big data analytics with in their organization
  9. Find information on your education, training partner on their big data analytics training offerings to plan to train you workforce on newer technologies
  10. Create a task force ( may be for the same focus group ) to work in scoping and budgeting for, the pilot big data analytics program

Output of these exercises could be a good assessment on the current state and to plan next steps.

Data Engineering Services – Introduction.

World is flat said Thomas L Friedman.

Companies big and small are finding ways to do business in this globalized world. Innovation is on the move. Outsourcing, Remote-Development, Distributed-Development have become common words in an enterprise. The domain of Data is no stranger to this model either.

So, if you are running Data warehousing, Business Intelligence programs, how does this model fit in a flat world ?

If you are running a services company, what does it take for you to run such programs, successfully for your end customers/clients?

I intend to explore these. Stay tuned.

BigData – Market consolidation and value.

I have mixed reaction on Vertica being bought by HP.


To start with, Vertica is a good product. They have carved out a niche space for themselves. Columnar database has come a long way and I would say its still emerging to give benefits. Vertica has long been a proponent of columnar databases. It is a good company, product line to buy. But I am concerned that the good product could go waste with HP, which has had questionable success in the space of Data warehousing.

But what stands out in this move by HP, is the market for Big Data solutions.

Big Data solutions are hot commodities today. Companies big and small are trying to manage data in a new scale and trying to consume them for purposes, that needs a different architecture all together. That is where Big Data solutions such as Vertica, Aster Data, ParAccell, Infobright, Xtremedata and frameworks such as MapReduce and Hadoop come to help.

With the ever growing need to collect more information and with the storage costs crashing, companies find lot of value in adopting Big Data solutions. Also, the type of data that companies big and small, have to deal with today is going through a fundamental transformation.

Combing through the Social world to mine ‘comments’ and ‘tweets’, which are unstructured, having to deal with hierarchical data structures such as JSON and having to be able to relate them with GeoSpatial information needs a shift in our approach towards data warehousing and analytics you would expects out of these systems.

I will write more about these challenges and also about some of these Big Data solutions soon. Stay tuned.