How strongly do I recommend Data Science on AWS?
6 / 10
Data Science on AWS is full of examples on how to leverage and integrate existing AWS data science tools to improve your product. Many of these data engineering and data science tools were developed by Amazon for their own marketplace, then generalized for use within AWS.
This book lives in a highly practical space. You will see plenty of code snippets in Python and SQL for use cases that are common to many applications, for instance language translations, product reviews and ratings, and product recommendations.
Read the Data Science on AWS book if you’re trying to get a basic understanding of the capabilities of AWS data science tools but don’t want to weed through tons of online documentation.
Top Ideas in This Book
AWS now ships with an abundance of built-in algorithms and pre-trained models that empower software engineers without a background in data science to become productive quickly. Leave customization of your algorithms and models to more skilled experts.
Recommendation systems often make the mistake of biasing results toward results that are already popular, while deprioritizing unpopular but highly relevant results. You can address these issues with algorithms like multiarmed bandits.
The more you lean into AWS, the more value you can derive from their tight integrations. Most services can speak directly to other services, reducing the amount of data transfer and external tooling you need, simplifying your AWS data pipelines as a result.
The good news is that AWS costs tend to decrease over time because Amazon itself is built on three foundational beliefs about customers:
Data Science on AWS is full of SQL and Python examples. For budding data scientists, data analysts, or just software engineers that want to wrap their head around the basics of data science, learning SQL and Python is a must.
In my opinion, SQL is the language most likely to be around 30 years from now. Every developer should know their way around SQL.
QuickSight is a technology I’m interested to explore more because it automatically identifies new data sources, then has a no-code/low-code editor for creating business intelligence dashboards.
Feature selection identifies the attributes in data that best represent the dataset. Correlations are identified and used to produce results, for instance which images have similar characteristics and therefore represent a similar concept. When data models are trained using biased data, the results will have those biases baked in.