Recent announcement of Apple brings up some interesting models for data processing and delivery.
Simply put – iCloud from Apple intends to be your Music library on the cloud, serve to sync your Apps (applications), Documents etc. across all your Apple devices.
Data Strategy for Processing, Distribution and Availability
An effective data strategy’s foremost goal is to make data accessible. It means, all the groups and decisions makers with in those groups should have access to all relevant data. They should not be spending much time to access them and the processes in between should be robust enough to support that need.
On the other hand, the data teams whose main charter is to process and make data accessible to those teams, often have lot of constraints e.g. Complex upstream architectures, Volume constraints, Longer processing times, Inefficient and slow processes which all incur data availability and delivery delays.
Data processing models such Hadoop, Cassandra tries to solve the data processing problems, where these distributed processing models process chucks of data in parallel.
‘Data in the cloud’ models are economic models where the entire data environment essentially resides in the cloud, cutting IT costs.
Cloud Processing Models
‘Data in the cloud’ strategy is not just a novel and economic way to store humungous amount of data. The cloud strategy should also include plans to process data in cycles and make it available, in a rapid manner.
Old strategies of data processing strictly revolved around building stringent processes on your ETL processes to make sure source-data is available, source-data transportation, process on-target, roll-outs, availability SLA’s etc.
Max Availability Models
We have moved away from those models as they do not serve the need of the day. If you are told that you need to wait for an hour or two, for you to get your sales numbers since the system is in the ‘processing window’, you would term that wait as an opportunity lost.
Data served late is data that is of little or no use!
Irrespective of the data volume, they have to be processes faster and make it available as soon as possible.
‘Data in the Cloud’ model also is not one that stores processed data in a cloud-storage like an unbound virtual hard-disk. Instead, an architecture should include cloud models for all stages of a data warehouse and business intelligence.
They are excellent economic and scalable candidates for
- Source data checks and processing
- Rapid Data Mapping
- Business Rules Validations
- Process-on-source processes
- Distributed processing
- Rapid Transport Models
- Hosting Analytic Models
- Data partitioning, compression and archiving (sunset models)