[ Athena Overview ]

Database 는 아니지만 S3위에 query 엔진을 제공

- Fully Serverless database with SQL capabilities

- Used to query data in S3

- Pay per query

- Output results back to S3

- Secured through IAM

※ Use Case : one time SQL queries, serverless queries on S3, log analytics

 

[ Athena for Solutions Architect ]

Operations : no operations needed, serverless

Security : IAM + S3 security

Reliability : managed service, uses Presto engine, highly available

Performance : queries scale based on data size

Cost : pay per query / per TB of data scanned, serverless

 

 

 

반응형

[ Databases in AWS : S3 ]

- S3 is a key / value store for objects

- Great for big objects, not so great for small objects

- Serverless, sclaes infinitely, max object size is 5 TB

- Strong consistency

- Tiers : S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups

- Features : Versioning, Encryption, Cross Region Replication, etc...

- Security : IAM, Bucket Policies, ACL(Access Control Policy)

- Encryption : SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit

※ Use Case : static files, key value store for big files, website hosting

 

[ S3 for Solutions Architect ]

Operations : no operations needed

Security : IAM , Bucket Policies, ACL, Encryption (Server/Client), SSL

Reliability : 99.99% durability / 99.9% availability, Multi AZ, CRR(Cross Region Replication)

Performance : scales to thousands of read/writes per second, transfer acceleration/multi-part for big files

Cost : pay per storage usage, network cost, requests number

반응형

[ DynamoDB ]

AWS 소유 기술의 NoSQL DB로 key/value 쌍으로 데이터 저장

멀티AZ, 읽기와 쓰기의 분리, read cache 로 DAX 사용

IAM 을 사용하여 보안

DynamoDB Stream 을 사용하여 AWS Lambda와 통합 (DynamoDB Stream이 데이터 변화 감지하여 AWS Lambda 호출)

백업과 복구 가능, 글로벌 테이블 사용

cloudwatch를 통한 모니터링

SQL 쿼리 불가. 오직 key 및 인덱스 기준 조회만 가능

트랜잭션 지원 (2018. 11월)

- AWS proprietary technology, managed NoSQL database

- Serverless, provisioned capacity, auto scaling, on demand capacity (Nov 2018)

- Can replace ElastiCache as a key/value store (storing session data for example)

- Highly Available, Multi AZ by default, Read and Writes are decoupled, DAX for read cache

- Reads can be eventually consistent or strongly consistent

- Security, authentication and authorization is done through IAM

- DynamoDB Streams to integrate with AWS Lambda

- Backup / Restore feature, Global Table feature

- Monitoring through CloudWatch

- Can only query on primary key, sort key, or indexes

※ Use Case : Serverless applications development (small documents 100s KB), distributed serverless cache, doesn't - have SQL query language available, has transactions capability from Nov 2018

 

[ DynamoDB for Solutions Architect ]

Operations : no operations needed, auto scaling capability, serverless

Security : full security through IAM policies, KMS encryption, SSL in flight

Reliability : Multi AZ, Backups

Performance : single digit millisecond performance, DAX for caching reads, performance doesn't degrade if your application scales

Cost : Pay per provisioned capacity and storage usage (no need to guess in advance any capacity - can use auto scaling)

 

반응형

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 18-6. Databases in AWS : Athena  (0) 2021.09.25
[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-4. Databases in AWS : ElastiCache  (0) 2021.09.23
[AWS] 18-3. Databases in AWS : Aurora  (0) 2021.09.23
[AWS] 18-2. Databases in AWS : RDS  (0) 2021.09.23

[ Databases in AWS : ElastiCache ]

1. Managed Redis/Memcached (similar offering as RDS, but for caches)

2. In-memory data store, sub-milisecond latency

3. Must provision an EC2 instance type

4. Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)

5. Security through IAM, Security Groups, KMS, Redis Auth

6. Backup / Snapshot / Point in time restore feature

7. Managed and Scheduled maintenance

8. Monitoring through CloudWatch

※Use Case : Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL. 

 

[ ElastiCache for Solutions Architect ]

Operations : same as RDS

Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL

Reliability : Clustering, Multi AZ

Performance : Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option

Cost : Pay per hour based on EC2 and storage usage

 

반응형

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-5. Databases in AWS : DynamoDB  (0) 2021.09.25
[AWS] 18-3. Databases in AWS : Aurora  (0) 2021.09.23
[AWS] 18-2. Databases in AWS : RDS  (0) 2021.09.23
[AWS] 18. Choosing the right database  (0) 2021.09.23

[ Aurora ]

OLTP 트랜잭션 프로세싱 지원

PostgreSQL/MySQL 호환

오토스케일링

1. Compatible API for PostgreSQL/MySQL (OLTP)

2. Data is held in 6 replicas, across 3 AZ

3. Auto healing capability

4. Multi AZ, Auto Scaling Read Replicas

5. Read Replicas can be Global

6. Aurora database can be Global for DR or latency purposes

7. Auto scaling of storage from 10GB to 128TB

8. Define EC2 instance type for aurora instances

9. Same security / monitoring / maintenance features as RDS

10. Aurora Serverless - for unpredictable / intermittent(간헐적인) workloads

11. Aurora Multi-Master - for continuous writes failover

※ Use case : same as RDS, but with less maintenance/more flexibility/more performance

 

[ Aurora for Solutions Architect ]

Operations : less operations, auto scaling storage

Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

Reliability : Multi AZ, highly available, possibly more than RDS, Aurora Serverless option, Aurora Multi-Master option

Performance : 5x performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas (only 5 for RDS)

Cost : Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise grade databases such as Oracle

 

반응형

[ Databases in AWS : RDS(Relational Database Service) ]

1. Managed PostgreSQL / MySQL / Oracle / SQL Server

2. Must provision an EC2 instance & EBS Volume type and size

3. Support for Read Replicas and Multi AZ

4. Security through IAM, Security Groups, KMS, SSL in transit

5. Backup / Snapshop / Point in time restore feature

6. Managed and Scheduled maintenance

7. Monitoring through CloudWatch

8. Use case : Store relational datasets (RDBMS/OLTP), perform SQL queries, transactional I/U/D

※ OLTP : On-line Transactional Processing

 

[ RDS for Solutions Architect ]

1. Operations : small downtime when failover happens, when maintenance happens, scaling in read replicas/ec2 instance/restore EBS implies manual intervention, application changes

2. Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

3. Reliability : Multi AZ feature, failover in case of failures

4. Performance : depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Storage auto-scaling & manual scaling of instances

5. Cost : Pay per hour based on provisioned EC2 and EBS

 

 

반응형

[ Choosing the right database ]

- We have a lot of managed databases on AWS to choose from

- Questions to choose the right database based on your architecture :

1) Read-heavy, write-heavy, or balanced workload? Thoughput needs? Will it change, does it need to scale or fluctuate during the day?

2) How much data to store and for how long? Will it grow? Average object size? How are they accessed?

3) Data durability? Source of truth for the data?

4) Latency requirements ? Concurrent users?

5) Data model? How will you query the data? joins? Structured? Semi-Structured?

6) Strong schema? More flexibility? Reporting? Search? RBDMS/NoSQL?

7) License costs? Switch to Cloud Native DB such as Aurora?

 

[ 1. Database Types ]

1. RDBMS(=SQL/OLTP) : RDS, Aurora - great for joins

2. NoSQL database : DynamoDB (~JSON), ElastiCache (key/value pairs), Neptune(graphs) - no joins, no SQL

3. Object Store : S3 (for big objects) / Glacier (for backups / archives)

4. Data Warehouse (=SQL Analytics/BI) : Redshift (OLAP), Athena

5. Search : ElasticSearch(JSON) - free text, unstructured searches

6. Graphs : Neptune - displays relationships between data

 

 

반응형

[ Big Data Ingestion Pipeline : Todo List ] 

1. We want the ingestion pipeline to be fully serverless

2. We want to collect data in real time

3. We want to transform the data

4. We want to query the transformed data using SQL

5. The reports created using the queries should be in S3

6. We want to load that data into a warehouse and create dashboards

 

[ Big Data Ingestion Pipeline ]

- IoT Core allows you to harvest data from IoT devices

- Kinesis is great for real-time data collection

- Firehose helps with data delivery to S3 in near real-time (1 min)

- Lambda can help Firehose with data transformations

- Amazon S3 can trigger notifications to SQS

- Lambda can subscribe to SQS (we could have connecter S3 to Lambda)

- Athena is a serverless SQL service and results are stored in S3

- The reporting bucket contains analyzed data and can be used by reporting tool such as AWS QuickSight, Redshift, etc...

반응형

- We have an application running on EC2, that distributes software updates once in a while

- When a new software update is out, we get a lot of request and the content is distributed in mass over the network. It's very costly

- We don't want to change our application, but want to optimize our cost and CPU, how can we do it?

 

[ Our application current state ]

ELB + ASG , running on multi AZ

 

[ Easy way to fix things : Using Amazon CloudFront ]

Why CloudFront?

- No changes to architecture

- Will cache software update files at the edge

- Software update files are not dynamic, they're static (never changing)

- Our EC2 instances aren't serverless

- But CloudFront is, and will scale for us

- Our ASG will not scale as much, and we'll save tremendously in EC2

- We'll also save in availability, network bandwidth cost, etc

- Easy way to make an existing applicaition more scalable and cheaper

반응형

[ Distributing paid content ]

1. We sell videos online and users have to paid to buy videos

2. Each videos can bought by many different customers

3. We only want to distribute videos to users who are premium users

4. We have a database of premium users

5. Links we send to premium users should be short lived

6. Our application is global

7. We want to be fully serverless

 

[ Start simple, premium user service ]

Cognito 를 사용하여 인증(authentication)

DB 조회를 통해 유저가 프리미엄 유저인지 확인(인가(authorization))

 

[ Add Videos Storage Secure, Distribute Globally and Secure, Distribute Content only to premium users ]

1) 영상 URL 요청

2) Cognito 를 통해 인증

3) Lambda 를 통해 프리미엄 유저인지 확인(인가)

4) 프리미엄 유저인 경우 유효시간이 정해져있는 CroudFront Signed URL 을 생성 (CloudFront 는 Signed URL을 통해서만 접근이 가능토록 설정)

5) Signed URL 리턴 

6) 유저는 Signed URL 을 통해 CloudFront 에 접속 및 S3 영상 자원 열람

 

[ Premium User Video service ]

We have implemented a fully serverless solution :

1. Cognito for authentication

2. DynamoDB for storing users that are premium

3. 2 serverless applications

4. Content is stored in S3 (serverless and scalable)

5. Integrated with CloudFront with OAI for security (users can't bypass)

6. CloudFront can only be used using Signed URLs to prevent unauthorized users

※ What about S3 Signed URL? They're not efficeint for global access

 

반응형

+ Recent posts