Building Modern Data Analytics Solutions on AWS
Data Analytics AWS Reference
AWS services to solve large volume challenges.
- Amazon Redshift is used as cloud-based data warehouse for storing massive amounts of structured and semi-structured data from operational databases, data warehouses, and data lakes for complex analysis. Amazon Redshift is an OLAP-type databases.
- Amazon S3 is used for storing data with high volume and high performance at low cost.
- AWS LakeFormation is used for storing massive volumes of data in a data lake.
AWS services to solve data variety challenges.
- Amazon Relational Database Service (Amazon RDS) is used as a cloud-based relational database. Amazon RDS is an OLAP-type databases.
- Amazon Redshift (reference)
- AWS DynamoDB is used as a cloud-based NoSQL database.
- Amazon OpenSearch Service is used for deploying AWS managed OpenSearch clusters for search visualization, and analytics of text and unstructured data. Amazon OpenSearch Service is used for interactive log analytics.
AWS services to solve data velocity challenges.
- Amazon EMR is used as a scalable, managed, big data platform that processes and analyzes large volumes of data at varying velocities. Amazon EMR requires more experience and technical knowledge, when prioritizing flexible (ETL) over convenience (AWS Glue)
- Amazon Managed Streaming for Apache Kafka (Amazon MSK) is used as a fully managed service that uses Apache Kafka to process high-velocity data streams. Amazon MSK helps organizations address data velocity challenges by offering a fully managed and scalable Apache Kafka service. This provides efficient processing of high-velocity data streams. Businesses can ingest, process, and analyze data in real time, making it a valuable tool for handling data velocity.
- Amazon Kinesis Data Streams is used to ingest, process and analyze real-time data. Businesses can ingest, process, and analyze large volumes of streaming data in real time.
- AWS Lambda is used as serverless compute service for real-time and event-driven data processing.
AWS services to solve data veracity challenges.
- AWS Glue is used as a serverless and managed ETL service, and also used for managing data quality with AWS Glue Data Quality. An organization should use AWS Glue for ETL when prioritizing convenience over flexibility (Amazon EMR). AWS Glue can be used as a meta store for your transformed data. AWS Glue includes the Data Catalog, which is a central metadata repository
-
Amazon EMR is used as a scalable, managed, big data platform that processes and analyzes large volumes of data at varying velocities. Amazon EMR requires more experience and technical knowledge, when prioritizing flexible (ETL) over convenience (AWS Glue)
-
AWS Glue DataBrew is used as a visual data preparation to clean and normalize data to prepare it for analytics and ML.
- Amazon DataZone is used as a data management service to catalog, govern, share, and analyze your data.
AWS services to solve data value challenges.
- Amazon QuickSight is used for business analytics service for building visualization performing ad-hoc analysis and getting business insights from your data.
- Amazon SageMaker is used as a fully managed service that provides tools to build train and deploy machine learning models quickly.
- **Amazon Athena**is an interactive query service a that allow you to analyze.