Building Modern Data Analytics Solutions on AWS

Amazon Redshift is used as cloud-based data warehouse for storing massive amounts of structured and semi-structured data from operational databases, data warehouses, and data lakes for complex analysis. Amazon Redshift is an OLAP-type databases.

Amazon S3 is used for storing data with high volume and high performance at low cost.

AWS LakeFormation is used for storing massive volumes of data in a data lake.

Amazon Relational Database Service (Amazon RDS) is used as a cloud-based relational database. Amazon RDS is an OLAP-type databases.

Amazon OpenSearch Service is used for deploying AWS managed OpenSearch clusters for search visualization, and analytics of text and unstructured data. Amazon OpenSearch Service is used for interactive log analytics.

Amazon EMR is used as a scalable, managed, big data platform that processes and analyzes large volumes of data at varying velocities. Amazon EMR requires more experience and technical knowledge, when prioritizing flexible (ETL) over convenience (AWS Glue)

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is used as a fully managed service that uses Apache Kafka to process high-velocity data streams. Amazon MSK helps organizations address data velocity challenges by offering a fully managed and scalable Apache Kafka service. This provides efficient processing of high-velocity data streams. Businesses can ingest, process, and analyze data in real time, making it a valuable tool for handling data velocity.

Amazon Kinesis Data Streams is used to ingest, process and analyze real-time data. Businesses can ingest, process, and analyze large volumes of streaming data in real time.

AWS Lambda is used as serverless compute service for real-time and event-driven data processing.

AWS Glue is used as a serverless and managed ETL service, and also used for managing data quality with AWS Glue Data Quality. An organization should use AWS Glue for ETL when prioritizing convenience over flexibility (Amazon EMR). AWS Glue can be used as a meta store for your transformed data. AWS Glue includes the Data Catalog, which is a central metadata repository

Amazon EMR is used as a scalable, managed, big data platform that processes and analyzes large volumes of data at varying velocities. Amazon EMR requires more experience and technical knowledge, when prioritizing flexible (ETL) over convenience (AWS Glue)
AWS Glue DataBrew is used as a visual data preparation to clean and normalize data to prepare it for analytics and ML.

Amazon DataZone is used as a data management service to catalog, govern, share, and analyze your data.

Amazon QuickSight is used for business analytics service for building visualization performing ad-hoc analysis and getting business insights from your data.

Amazon SageMaker is used as a fully managed service that provides tools to build train and deploy machine learning models quickly.