Big Data Analytics Strategy – State Of The Union

Technology options in the big data space often creates confusion and concerns.

Confusion is because, there are too many options and it’s not one or two monolithic platform which provide solution to every possible problem.

Concern is because, there is a heavy influence of open-source software, platforms and equally 3rd-party product vendors, who don’t have a long record in the space.

In essence, this is both a ‘problem of too many’ and ‘problem of very few’ !

Lets understand these dynamics in detail.

Problem of too many
As you explore big data technology options, you’d soon find that there is always more than one way to do things and there are too many technology options to do so.

Example : Data integration from one or more data sources.
If you have an existing data integration tool/platform such as Informatica or AbInitio, you could look to see if you can talk to and process data from your data sources.

If it can, then question is can it handle all types of data sources in play.

If the answer to that is a no, then you’d have to understand the data sources’ characteristics ( data type, frequency of arrival, quality, treatment needed etc. ) and accordingly pick a tool ( Kafka or Sqoop or Flume or Spark Streaming or Custom built )

In this example, we haven’t even yet talked about the performance considerations, but trying to make a first cut choice.

The number and choices on big data platforms is to some extent comparable to number and choices on database technologies.

Example : In the legacy world, one would look for database options such as Oracle or Microsoft SQL-Server or IBM DB2 or, MySQL.

Similarly, given the prominence and market adoption, you would come across names such as Cloudera, Hortonworks, IBM Big Insights, MapR etc.

Each of these big data companies’ has their own product vision and roadmap, which they project as choices to their customers, while they work very closely with the underly open sources communities. I will cover big data platforms separately.

Problem of very few

Platform pick and choices are taken on a set of criteria’s such as

  • History of the company
  • Product roadmap and development over years
  • Market adoption and general opinions
  • Periodic reviews and guidance from market, industry analysts
  • Financial stability and viability
  • Sales, Support and Services options
  • Licensing models and Price options
  • You would find that there are few big data companies, which has a long history presence and public !

You’d find most of them to be a private company, in matured start-up mode, backed by prominent and stable venture capitalists.

However, you’d be amazed to see the market adoption of their products to be wide spread, across industry segments and across the world !

This state, puzzle’s decision makes on a number of levels.

What reference points should be considered to pick a technology vendor or two ?

Should we use the same measure as those with which we picked our legacy tool or technology or should we frame new methods for the new world ?

Should we plunk down lot of cash on these newer products or should we wait and watch for those products and in turn supporting companies to mature or should we make investments today ?

How does such decisions play out on licensing and product rollout related cost structures and are there any other related hidden cost ( tangible and otherwise ) ?

All of the points expressed above should not scare one to stay away, but to actually realize that the ecosystem of Big Data should be seen with a new type of lens !

Big Data Analytics Strategy – Current State Assessment

Further to the preamble on big data analytics strategy, when you start, it is necessary to do an inventory check with few questions.

  1. Do we have in-house experts, who can work in identifying appropriate use-cases for big data analytics ?
  2. Do we have any groups, which has ‘experimented’ or have gone to deploy big data analytics programs ?
  3. Do we have any groups, which has run pilots, working with any big data vendors ?
  4. Do we have expert groups, or part of Data Stewards organization which has experience in coming up with or, vetting use-cases for big data analytics ?
  5. Do we have any restrictions or guidance in working with any product or services vendors in big data programs ?

This should help you to decide the next set of actions. Those actions could include

  1. Create a focus group ( a small team of 3-5 members ) which would work in formulating the big data analytics plan – a roadmap of sort, but not long and large enough to slow down
  2. If knowledge on big data technologies – even at a very high level – is lacking, it’d be good to get guidance from market analysts ( Gartner, IDC, Forrester ) or, go through product briefings from Big Data Analytics product vendors or, take up basic/preliminary courses on Big Data Analytics technologies
  3. Collect possible information in formulating use-cases for big data analytics
  4. Also, create a focus group in creating, prioritizing and finalizing use-cases
  5. Create working models bring in representation from business and IT into these groups and joint exercises
  6. Probe existing product ( Database, Data Integration, Business Intelligence, Data Visualization, Data Security and Hardware ) vendors, part of partnership on their big data offerings, product maturity, adoption and deployment experiences
  7. Do an inventory check on SI vendors, who work in your group and organization on their big data stories and experiences
  8. Part of focus groups’ research could also include competitive analysis of your peers or in the same industry space, to find their big data story, success and failure scenarios, vendor partnerships and use-cases and role of big data analytics with in their organization
  9. Find information on your education, training partner on their big data analytics training offerings to plan to train you workforce on newer technologies
  10. Create a task force ( may be for the same focus group ) to work in scoping and budgeting for, the pilot big data analytics program

Output of these exercises could be a good assessment on the current state and to plan next steps.

Big Data Analytics Strategy – Preamble

Years have passed, where I no longer have to introduce what Big Data, promise it holds – instead there is a lot of awareness and interest in what Big Data can do and to get the best out of it’s strengths.

However, the dilemma in adoption is because of the abundance of confusion in this space.

Big Data product vendor’s promises typically creates an impression that Big Data technologies can practically solve every problem in an enterprise ( Business to IT ), every possible use-case that’s out there and kick out existing process and world hunger ( OK, I made up that last one )

With those said –

How does one look, see, decide and adopt Big Data ?
What’s the strategy behind it ?
Any gotcha’s ?
Any cheat-sheets to cross-over ?

I shall share my thoughts on these and more.