Data Management In The Big Data Era – Part 1

First Wave

Late 2010’s saw companies being swept by the wave of Big Data “buzz” and “promises” as how it can transform enterprises. But again, most of the companies were in wait, read and watch mode.

Mainly because, they were struggling and moving slow to identify use cases to “test”, “evaluate” Big Data ( promises ) in their context. While there were lot of news and announcements ( of success stories, magic that could happen using Big Data ), maturity of the data platform and its ecosystem was in its early days.

Second Wave

2012, from my personal experience saw little movement and interest in the space.

Platform providers figured that unless they lead the way to provide tangible, relatable, real world success stories, equally call out the limitations and rough edges and provide enough industry reference points on use cases and what Big Data entails – they’d perish into history. They kept pushing the narrative.

However from an adoption stand point, industry at large was titled heavily between Internet and E-Commerce companies and the rest.

Internet and E-Commerce companies were grappling with issues to manage data – humongous amount of data – all the while wanting to save money and there by invested early on Data Platforms ( in spite of maturity of the platforms ).

Rest of the industry – Banks and Financial Services led the pack, with Health Care, Pharmaceuticals following – were moving, but cautiously. They had to balance between IT investments made vs. meeting Regulatory and Compliance needs vs. venturing into a new space, investing in new ecosystem.

Big Data  was directly gunning to question and change, their status quo !

Second wave continued with force through late 2014 (approximately), saw lot of flashy announcements, but gingerly movement. But in all, enterprise were looking at each other for their first move and success/failure stories to learn and make a move. But they couldn’t stop the heavy influence of Big Data.

As with legacy, while technology platforms and innovations were largely industry agnostic – every industry segment had its own nuances that industry at large was struggling to see into the future, with Big Data in the equation and ecosystem.

Fear Of Missing Out The “Unknown”

At the time, much of the focus was toward a “thorough” evaluation of use cases, leading to “smaller” investments in Big Data. Not to forget, the talent pool ( at the time ) was scarce to justify making larger investments.

Such small investments did move the needle for the companies, but it was mostly a “trial” period of a “nascent revolution”. Pressure for enterprises were if you didn’t have a program or a strategy on Big Data, you’d miss out a revolution ( that everybody talked about, read about ), but miss out something that stood “largely undefined”.

Blind Spots

Choices for the innovators ( platform providers, tool developers ) at the time were enormous, that much of the focus was to build or enhance “distributed data platforms” and tools and utilities that could “work with them” to keep the processes running and moving.

Fitment of such platforms or solutions ( at the time ) from an enterprise architecture stand point was “slotted for future release”, lacking much of thought or ideas.

But that was the inevitable wave that was going to sweep, for ultimately the problems that enterprises were confronting – with legacy, the tools and processes that were put in place ( for ages ) to help tackle them – were not going to be “automatically change and be usable” with new tools and techniques that Big Data platforms and tools brought in.

Yahoo !

Yahoo is sold out !

Steve Jobs said “Innovation distinguishes between a leader and a follower”

I guess it’s a difference between a leader and becoming irrelevant and dying. What a great company, journey, products and permanent footprints in the internet !

Personally, I have lots of memories… colleagues became friends. Acquaintances from that time became an integral part of my life.

Yahoooooooo ! Thank you for everything !

Big Data Analytics Strategy – State Of The Union

Technology options in the big data space often creates confusion and concerns.

Confusion is because, there are too many options and it’s not one or two monolithic platform which provide solution to every possible problem.

Concern is because, there is a heavy influence of open-source software, platforms and equally 3rd-party product vendors, who don’t have a long record in the space.

In essence, this is both a ‘problem of too many’ and ‘problem of very few’ !

Lets understand these dynamics in detail.

Problem of too many
As you explore big data technology options, you’d soon find that there is always more than one way to do things and there are too many technology options to do so.

Example : Data integration from one or more data sources.
If you have an existing data integration tool/platform such as Informatica or AbInitio, you could look to see if you can talk to and process data from your data sources.

If it can, then question is can it handle all types of data sources in play.

If the answer to that is a no, then you’d have to understand the data sources’ characteristics ( data type, frequency of arrival, quality, treatment needed etc. ) and accordingly pick a tool ( Kafka or Sqoop or Flume or Spark Streaming or Custom built )

In this example, we haven’t even yet talked about the performance considerations, but trying to make a first cut choice.

The number and choices on big data platforms is to some extent comparable to number and choices on database technologies.

Example : In the legacy world, one would look for database options such as Oracle or Microsoft SQL-Server or IBM DB2 or, MySQL.

Similarly, given the prominence and market adoption, you would come across names such as Cloudera, Hortonworks, IBM Big Insights, MapR etc.

Each of these big data companies’ has their own product vision and roadmap, which they project as choices to their customers, while they work very closely with the underly open sources communities. I will cover big data platforms separately.

Problem of very few

Platform pick and choices are taken on a set of criteria’s such as

  • History of the company
  • Product roadmap and development over years
  • Market adoption and general opinions
  • Periodic reviews and guidance from market, industry analysts
  • Financial stability and viability
  • Sales, Support and Services options
  • Licensing models and Price options
  • You would find that there are few big data companies, which has a long history presence and public !

You’d find most of them to be a private company, in matured start-up mode, backed by prominent and stable venture capitalists.

However, you’d be amazed to see the market adoption of their products to be wide spread, across industry segments and across the world !

This state, puzzle’s decision makes on a number of levels.

What reference points should be considered to pick a technology vendor or two ?

Should we use the same measure as those with which we picked our legacy tool or technology or should we frame new methods for the new world ?

Should we plunk down lot of cash on these newer products or should we wait and watch for those products and in turn supporting companies to mature or should we make investments today ?

How does such decisions play out on licensing and product rollout related cost structures and are there any other related hidden cost ( tangible and otherwise ) ?

All of the points expressed above should not scare one to stay away, but to actually realize that the ecosystem of Big Data should be seen with a new type of lens !

Big Data Analytics Strategy – Current State Assessment

Further to the preamble on big data analytics strategy, when you start, it is necessary to do an inventory check with few questions.

  1. Do we have in-house experts, who can work in identifying appropriate use-cases for big data analytics ?
  2. Do we have any groups, which has ‘experimented’ or have gone to deploy big data analytics programs ?
  3. Do we have any groups, which has run pilots, working with any big data vendors ?
  4. Do we have expert groups, or part of Data Stewards organization which has experience in coming up with or, vetting use-cases for big data analytics ?
  5. Do we have any restrictions or guidance in working with any product or services vendors in big data programs ?

This should help you to decide the next set of actions. Those actions could include

  1. Create a focus group ( a small team of 3-5 members ) which would work in formulating the big data analytics plan – a roadmap of sort, but not long and large enough to slow down
  2. If knowledge on big data technologies – even at a very high level – is lacking, it’d be good to get guidance from market analysts ( Gartner, IDC, Forrester ) or, go through product briefings from Big Data Analytics product vendors or, take up basic/preliminary courses on Big Data Analytics technologies
  3. Collect possible information in formulating use-cases for big data analytics
  4. Also, create a focus group in creating, prioritizing and finalizing use-cases
  5. Create working models bring in representation from business and IT into these groups and joint exercises
  6. Probe existing product ( Database, Data Integration, Business Intelligence, Data Visualization, Data Security and Hardware ) vendors, part of partnership on their big data offerings, product maturity, adoption and deployment experiences
  7. Do an inventory check on SI vendors, who work in your group and organization on their big data stories and experiences
  8. Part of focus groups’ research could also include competitive analysis of your peers or in the same industry space, to find their big data story, success and failure scenarios, vendor partnerships and use-cases and role of big data analytics with in their organization
  9. Find information on your education, training partner on their big data analytics training offerings to plan to train you workforce on newer technologies
  10. Create a task force ( may be for the same focus group ) to work in scoping and budgeting for, the pilot big data analytics program

Output of these exercises could be a good assessment on the current state and to plan next steps.

Big Data Analytics Strategy – Preamble

Years have passed, where I no longer have to introduce what Big Data, promise it holds – instead there is a lot of awareness and interest in what Big Data can do and to get the best out of it’s strengths.

However, the dilemma in adoption is because of the abundance of confusion in this space.

Big Data product vendor’s promises typically creates an impression that Big Data technologies can practically solve every problem in an enterprise ( Business to IT ), every possible use-case that’s out there and kick out existing process and world hunger ( OK, I made up that last one )

With those said –

How does one look, see, decide and adopt Big Data ?
What’s the strategy behind it ?
Any gotcha’s ?
Any cheat-sheets to cross-over ?

I shall share my thoughts on these and more.