Am alive

I am very much alive 🙂 My exams has been keeping me very busy. But there is a lot more to share, from the last few weeks of conversations, reading and learning. Will do shortly.

Stay tuned.

Data Engineering Services – Development Process

Further to my introduction, let us look at a typical Data Warehouse/Business Intelligence development process. This basic understanding is needed to decide whether any model for remote-development and/or distributed development could fit in.

Data Operations
Data Operations is an inter-layer between different groups with your company. They store and analyze information from every line of your business – Product, Sales, Inventory, Marketing, Finance etc.

These act as integrated unit with all business groups. They interact directly with stake holders to understand business flows, logic and purpose. They also work with them closely in helping them consume data – in other words ‘helping find value for business using data’.

Data Team
The data team typically has a business analysts who act as a channel between business groups and technical teams (of data). They have a project and program manager who manages projects and deliverable’s. They also would have one or more data architects who is the custodian of the entire data architecture and infrastructure. And they could have data engineers and reporting engineers, who handle develop programs to process data and do reporting respectively.

Stake Holders
On the business side of this, you have stake holder directly interacting with the data team members. They could be Business Owners, Domain Specialists, Functional Business Analysts or facilitators to the Executive Management of the company.

As you can see, the data team and its operation crisscross most if not all of the different functions of your business.

Development Model
A development model of a data team starts with business requirements (compiled as a BRD – Business Requirement Document or PRD – Product Requirement Document) compiled and driven by the business.

These are analyzed by the data team to prepare Functional and Technical Specifications. Project plan is drawn which includes details on the development plan. These delivery plans are typically iterative and has shorter response/release cycles for businesses to get incremental value.

With the basic understanding of different teams and their purpose, let us understand the Data warehouse Development Life Cycle more.

Comments : Disabled

I am a big believer in Social Networking, Collaboration, Brain Storming, Voicing ideas, thoughts and suggestions. However, the spammers are steadfast in their mission to just ruin any or all such experience.

I have been patiently and meticulously scanning comments and marking those as SPAM. However, I do see they acting as mutants. I am spending more time to keep the house clean, than to engage in any conversation.

So, I have disabled comments all together. I am sorry. However, I am game for any constructive discussion.

I have update the ‘About Shankar‘, with ways to reach me or follow me.

Thank you for your understanding.

Data Engineering Services – Introduction.

World is flat said Thomas L Friedman.

Companies big and small are finding ways to do business in this globalized world. Innovation is on the move. Outsourcing, Remote-Development, Distributed-Development have become common words in an enterprise. The domain of Data is no stranger to this model either.

So, if you are running Data warehousing, Business Intelligence programs, how does this model fit in a flat world ?

If you are running a services company, what does it take for you to run such programs, successfully for your end customers/clients?

I intend to explore these. Stay tuned.

Steve Jobs

Steve JobsI am at loss of words on Steve Jobs passing. It is a very sad day of my life.

He was my virtual mentor, motivator, greatest inspiration of my entire life. So many times in my life (professional and personal) when ever I felt down, when ever I want to motivate myself, cheer myself up – I looked up to this man. His life, his work, his unparalleled work, his innovation was the greatest inspiration for me – every single time.

It was not just about the Apple and Steve Jobs innovation, which attracted me like million others. But it is about his ‘passion’ to innovate.

This line from his phenomenal Stanford University Commencement speech, says it all

And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it. And, like any great relationship, it just gets better and better as the years roll on. So keep looking until you find it. Don’t settle.

RIP Steve Jobs.

I dearly miss you.

iCloud for Data

Recent announcement of Apple brings up some interesting models for data processing and delivery.

Simply put – iCloud from Apple intends to be your Music library on the cloud, serve to sync your Apps (applications), Documents etc. across all your Apple devices.

Data Strategy for Processing, Distribution and Availability
An effective data strategy’s foremost goal is to make data accessible. It means, all the groups and decisions makers with in those groups should have access to all relevant data. They should not be spending much time to access them and the processes in between should be robust enough to support that need.

On the other hand, the data teams whose main charter is to process and make data accessible to those teams, often have lot of constraints e.g. Complex upstream architectures, Volume constraints, Longer processing times, Inefficient and slow processes which all incur data availability and delivery delays.

Data processing models such Hadoop, Cassandra tries to solve the data processing problems, where these distributed processing models process chucks of data in parallel.

‘Data in the cloud’ models are economic models where the entire data environment essentially resides in the cloud, cutting IT costs.

Cloud Processing Models
‘Data in the cloud’ strategy is not just a novel and economic way to store humungous amount of data. The cloud strategy should also include plans to process data in cycles and make it available, in a rapid manner.

Old strategies of data processing strictly revolved around building stringent processes on your ETL processes to make sure source-data is available, source-data transportation, process on-target, roll-outs, availability SLA’s etc.

Max Availability Models
We have moved away from those models as they do not serve the need of the day. If you are told that you need to wait for an hour or two, for you to get your sales numbers since the system is in the ‘processing window’, you would term that wait as an opportunity lost.

Data served late is data that is of little or no use!

Irrespective of the data volume, they have to be processes faster and make it available as soon as possible.

Cloud Models
‘Data in the Cloud’ model also is not one that stores processed data in a cloud-storage like an unbound virtual hard-disk. Instead, an architecture should include cloud models for all stages of a data warehouse and business intelligence.

They are excellent economic and scalable candidates for

  • Source data checks and processing
  • Rapid Data Mapping
  • Business Rules Validations
  • Process-on-source processes
  • Distributed processing
  • Rapid Transport Models
  • Hosting Analytic Models
  • Data partitioning, compression and archiving (sunset models)

Mobile Intelligence

We are living in exciting times. Every form of technology is essentially converting towards one and one thing only – Mobility. Gone are the days, when our mobile phones were meant only for phone-calls and (little later) SMS (short messaging service). Computing was done almost fully out of Laptops and now we are excited about Tablets (iPad) and Smart Phones (iPhone, Blackberry, Android powered phones).

If you run a business, think about the opportunity and challenges (ahead) as well.

Challenge is you should have a strategy in place (already) to this medium. Your customers (and potential customer) are doing to work with your products and use your services, more using this medium. As a company you cannot have ‘one strategy fits all’ approach.

Opportunity is where, instead you have look deep into your Mobile Content and Enrichment, Social Media Strategy, Contextual Profiles and Analytics.

Mobile Intelligence also demands a sound infrastructure setup. Mobile Intelligence is also about getting actionable intelligence in quicker intervals. Look deep into your cloud strategy, distributed databases, distributed processing, data-sources, data volume, data movement, ETL strategy, data retention policies, replication scenarios and last but not least analytical models.

Mobile Intelligence is your next frontier to compete and excel.

The CIEL Project and Skywriting.

“Unlike MapReduce or Dyrad frameworks, it offers full general-purpose, dynamic task graph execution, which enables developers to implement algorithms using arbitrary task structures”

The CIEL Project and Skywriting looks very promising and simple to implement. Task Graph Execution is an important feature and it looks to me that it would be very flexible to implement that with CIEL Project.

Cannot wait to play with this. More on this to come.

BigData – Market consolidation and value.

I have mixed reaction on Vertica being bought by HP.


To start with, Vertica is a good product. They have carved out a niche space for themselves. Columnar database has come a long way and I would say its still emerging to give benefits. Vertica has long been a proponent of columnar databases. It is a good company, product line to buy. But I am concerned that the good product could go waste with HP, which has had questionable success in the space of Data warehousing.

But what stands out in this move by HP, is the market for Big Data solutions.

Big Data solutions are hot commodities today. Companies big and small are trying to manage data in a new scale and trying to consume them for purposes, that needs a different architecture all together. That is where Big Data solutions such as Vertica, Aster Data, ParAccell, Infobright, Xtremedata and frameworks such as MapReduce and Hadoop come to help.

With the ever growing need to collect more information and with the storage costs crashing, companies find lot of value in adopting Big Data solutions. Also, the type of data that companies big and small, have to deal with today is going through a fundamental transformation.

Combing through the Social world to mine ‘comments’ and ‘tweets’, which are unstructured, having to deal with hierarchical data structures such as JSON and having to be able to relate them with GeoSpatial information needs a shift in our approach towards data warehousing and analytics you would expects out of these systems.

I will write more about these challenges and also about some of these Big Data solutions soon. Stay tuned.