How to Build Next Generation Analytic Platform with Data Lake on AWS

The Value of Data

“By 2022, more than half of major new business systems will incorporate continuous intelligence that uses real-time context data to improve decisions.” Quoted from Gartner Report,《Predicts 2019: Analytics and BI Strategy》

Do you know that Netflix’s personalized recommendation engine worth $1 billion per year? According to Netflix, it collects data like:

  • Viewer’s interaction with its service
  • The similar subscribers’ tastes and preferences
  • Content information like genre and actors
  • The subscriber behavior such as the time of day a subscriber watch

To put into their algorithm to generate recommendations to its subscribers. This personalized recommendation engine actually saves Netflix $1 billion every year from subscriber canceling its service to keep subscribers attention on Netflix. Now that 80% of subscriber choices comes from recommendations.

This is the value of data brings to business. However, in reality, not many companies understand how they can extract the business value from data, especially when it comes to BIG DATA. Companies are familiar with their normal business operations and what value can bring to them from the normal business transactional data, but with the trend of BIG DATA, companies are highly interested in exploiting the unseen business value from the existing owned data and as well as the market available data. The word Big Data drives a lot of fantasy to business owners, yet, not many of them can make it successful.

Here are three commonly seen challenges to business owners and IT teams are:

Challenge 1: “The investment of big data project is large, and I am not sure if there will be a positive ROI.”

We love the revolution brought by Cloud. In traditional IT, data analytics capability was only available by companies who can afford expensive hardware like data warehouse appliances and analytical or business intelligence tools. Usually, these heavy on-premises tools are a big bang on initial investment to enterprises. The pay-as-you-go, on demand model of cloud offer a comparatively small in investment for any companies to have a taste on the data analytics. For example, on AWS, using AWS Glue to do ETL only costs you several dollars per job. We have experienced one use case spending 10% of the cost on building the end-to-end analytical capability on
Cloud compared to on-premises traditional data warehouses.

Challenge 2: “ I don’t know what data would be useful to me and I don’t know how to collect them. ”

Building a data lake helps you to centralize all the data which is distributed at various application silos and make the data easily accessible for any analytical or machine learning use in the future. The importance of data lake is to have your data be ready in place and indexed and labeled well. Therefore, in the future at any time, for any use case, you can easily access the data you need from the data lake. On AWS, if you choose to use S3 to build your data lake, the cost of storage is very low. So when you do not know what to collect, you can start building a data lake and ingest data into it, phase by phase, project by project to enrich it.

Challenge 3: “I could not find the talent who can translate data from IT perspective to business value.”

According to Gartner, by 2020, the number of data and analytics experts in business units will grow at three times the rate of experts in IT departments, which will force companies to rethink their organizational models and skill sets. Line of business users a lot of the times, they know what they want to know but they cannot understand from raw data. Therefore, what is important to IT is to build a self-service analytical platform, turning the raw data less raw by categorizing the data to various measures and metrics. Business users then can derive their desired business insights from these available measure and metrics.

Next Generation Analytic Platform on AWS

The diagrams show the conceptual end to- end data journey on Cloud (Figure 1).
From left to right, is how raw data from various data sources and gradually becomes meaningful and useful and applicable to different business scenarios. Usually, the business scenarios are categorized into 5 areas:

  • Report automation
  • Dynamic dashboarding
  • Self-service analytics
  • AI & machine learning
  • Business applications

Each of these scenarios draws cleansed data from data lake and undergo different journey to make it valuable.

Figure 1: The conceptual end-to-end data journey on Cloud

The Value of Data

eCloudvalley offers modular approach to help companies to build their data analytical platform on AWS (Figure 2).
The modular approach includes:

  • Data platform design and implementation
  • Data modeling discovery and development
  • Reports and dashboard design
  • API development
  • User enablement

Figure 2: build their data analytical platform on AWS

Successful Case Study: Consumer goods Chain Stores for Report Automation & Interactive analysis

About Customer
A leading Pan-Asian retailer involved in the processing and wholesaling of food and personal products in diverse regions around the world. With 200,000+ employees and $10+ billion in revenue, the company operates 5,000+ stores in 10+ countries in Asia.

Customer’s Challenges
Customer in the previous owned a traditional on-premise data processing journey for report generation, specifically on sales related reports to be sent to senior management on a pdf format. However, senior management would like:

  • Perform advanced retail analytics
  • Get more timely sales reports. These requirements were not easily being catered by the previous on-premises data journey.

Solutions from eCloudvalley
To resolve the two challenges, eCloudvalley would like to leverage AWS as the new automated report generation platform to perform:

  • Added retail analytic capability.
  •  Shorten the data processing time by leveraging Redshift’s massive parallel processing power, together with Tableau so that senior management no longer needs to wait for pdfs but they can access to Sales and retail analytic dashboard at any time.

The Benefits

  • Reducing ETL time from >24 hours to 4 hours
  • Access of timely reports and dashboards at anytime, anywhere
2021-04-01T21:42:04+00:00 2021/01/20 |Insights|