Data Science on AWS

How strongly do I recommend Data Science on AWS?
6 / 10

Review of Data Science on AWS

Data Science on AWS is full of examples on how to leverage and integrate existing AWS data science tools to improve your product. Many of these data engineering and data science tools were developed by Amazon for their own marketplace, then generalized for use within AWS.

This book lives in a highly practical space. You will see plenty of code snippets in Python and SQL for use cases that are common to many applications, for instance language translations, product reviews and ratings, and product recommendations.

Read the Data Science on AWS book if you’re trying to get a basic understanding of the capabilities of AWS data science tools but don’t want to weed through tons of online documentation.

Top Ideas in This Book

Start by leveraging built-in algorithms and pre-trained models
Recommendations should avoid the popularity trap of only displaying already popular items
Tight integration between Amazon services makes development easier and faster
A little SQL and Python knowledge get you pretty far
Amazon QuickSight discovers new data sources in your AWS account and enables you to build dashboards
Lazy feature engineering can perpetuate racial, gender, and age biases

Start by leveraging built-in algorithms and pre-trained models

AWS now ships with an abundance of built-in algorithms and pre-trained models that empower software engineers without a background in data science to become productive quickly. Leave customization of your algorithms and models to more skilled experts.

Recommendations should avoid the popularity trap of only displaying already popular items

Recommendation systems often make the mistake of biasing results toward results that are already popular, while deprioritizing unpopular but highly relevant results. You can address these issues with algorithms like multiarmed bandits.

Tight integration between Amazon services makes development easier and faster

The more you lean into AWS, the more value you can derive from their tight integrations. Most services can speak directly to other services, reducing the amount of data transfer and external tooling you need, simplifying your AWS data pipelines as a result.

The good news is that AWS costs tend to decrease over time because Amazon itself is built on three foundational beliefs about customers:

Customers will always want lower prices
Customers will always want more selection
Customers will always want faster delivery

A little SQL and Python knowledge get you pretty far

Data Science on AWS is full of SQL and Python examples. For budding data scientists, data analysts, or just software engineers that want to wrap their head around the basics of data science, learning SQL and Python is a must.

In my opinion, SQL is the language most likely to be around 30 years from now. Every developer should know their way around SQL.

Amazon QuickSight discovers new data sources in your AWS account and enables you to build dashboards

QuickSight is a technology I’m interested to explore more because it automatically identifies new data sources, then has a no-code/low-code editor for creating business intelligence dashboards.

Lazy feature engineering can perpetuate racial, gender, and age biases

Feature selection identifies the attributes in data that best represent the dataset. Correlations are identified and used to produce results, for instance which images have similar characteristics and therefore represent a similar concept. When data models are trained using biased data, the results will have those biases baked in.

Implementing End-to-End, Continuous AI and Machine Learning Pipelines