[ Databases in AWS : S3 ]

- S3 is a key / value store for objects

- Great for big objects, not so great for small objects

- Serverless, sclaes infinitely, max object size is 5 TB

- Strong consistency

- Tiers : S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups

- Features : Versioning, Encryption, Cross Region Replication, etc...

- Security : IAM, Bucket Policies, ACL(Access Control Policy)

- Encryption : SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit

※ Use Case : static files, key value store for big files, website hosting


[ S3 for Solutions Architect ]

Operations : no operations needed

Security : IAM , Bucket Policies, ACL, Encryption (Server/Client), SSL

Reliability : 99.99% durability / 99.9% availability, Multi AZ, CRR(Cross Region Replication)

Performance : scales to thousands of read/writes per second, transfer acceleration/multi-part for big files

Cost : pay per storage usage, network cost, requests number


[ DynamoDB ]

AWS 소유 기술의 NoSQL DB로 key/value 쌍으로 데이터 저장

멀티AZ, 읽기와 쓰기의 분리, read cache 로 DAX 사용

IAM 을 사용하여 보안

DynamoDB Stream 을 사용하여 AWS Lambda와 통합 (DynamoDB Stream이 데이터 변화 감지하여 AWS Lambda 호출)

백업과 복구 가능, 글로벌 테이블 사용

cloudwatch를 통한 모니터링

SQL 쿼리 불가. 오직 key 및 인덱스 기준 조회만 가능

트랜잭션 지원 (2018. 11월)

- AWS proprietary technology, managed NoSQL database

- Serverless, provisioned capacity, auto scaling, on demand capacity (Nov 2018)

- Can replace ElastiCache as a key/value store (storing session data for example)

- Highly Available, Multi AZ by default, Read and Writes are decoupled, DAX for read cache

- Reads can be eventually consistent or strongly consistent

- Security, authentication and authorization is done through IAM

- DynamoDB Streams to integrate with AWS Lambda

- Backup / Restore feature, Global Table feature

- Monitoring through CloudWatch

- Can only query on primary key, sort key, or indexes

※ Use Case : Serverless applications development (small documents 100s KB), distributed serverless cache, doesn't - have SQL query language available, has transactions capability from Nov 2018


[ DynamoDB for Solutions Architect ]

Operations : no operations needed, auto scaling capability, serverless

Security : full security through IAM policies, KMS encryption, SSL in flight

Reliability : Multi AZ, Backups

Performance : single digit millisecond performance, DAX for caching reads, performance doesn't degrade if your application scales

Cost : Pay per provisioned capacity and storage usage (no need to guess in advance any capacity - can use auto scaling)



'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 18-6. Databases in AWS : Athena  (0) 2021.09.25
[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-4. Databases in AWS : ElastiCache  (0) 2021.09.23
[AWS] 18-3. Databases in AWS : Aurora  (0) 2021.09.23
[AWS] 18-2. Databases in AWS : RDS  (0) 2021.09.23

[ Databases in AWS : ElastiCache ]

1. Managed Redis/Memcached (similar offering as RDS, but for caches)

2. In-memory data store, sub-milisecond latency

3. Must provision an EC2 instance type

4. Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)

5. Security through IAM, Security Groups, KMS, Redis Auth

6. Backup / Snapshot / Point in time restore feature

7. Managed and Scheduled maintenance

8. Monitoring through CloudWatch

※Use Case : Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL. 


[ ElastiCache for Solutions Architect ]

Operations : same as RDS

Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL

Reliability : Clustering, Multi AZ

Performance : Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option

Cost : Pay per hour based on EC2 and storage usage



'infra & cloud > AWS' 카테고리의 다른 글

[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-5. Databases in AWS : DynamoDB  (0) 2021.09.25
[AWS] 18-3. Databases in AWS : Aurora  (0) 2021.09.23
[AWS] 18-2. Databases in AWS : RDS  (0) 2021.09.23
[AWS] 18. Choosing the right database  (0) 2021.09.23

[ Aurora ]

OLTP 트랜잭션 프로세싱 지원

PostgreSQL/MySQL 호환


1. Compatible API for PostgreSQL/MySQL (OLTP)

2. Data is held in 6 replicas, across 3 AZ

3. Auto healing capability

4. Multi AZ, Auto Scaling Read Replicas

5. Read Replicas can be Global

6. Aurora database can be Global for DR or latency purposes

7. Auto scaling of storage from 10GB to 128TB

8. Define EC2 instance type for aurora instances

9. Same security / monitoring / maintenance features as RDS

10. Aurora Serverless - for unpredictable / intermittent(간헐적인) workloads

11. Aurora Multi-Master - for continuous writes failover

※ Use case : same as RDS, but with less maintenance/more flexibility/more performance


[ Aurora for Solutions Architect ]

Operations : less operations, auto scaling storage

Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

Reliability : Multi AZ, highly available, possibly more than RDS, Aurora Serverless option, Aurora Multi-Master option

Performance : 5x performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas (only 5 for RDS)

Cost : Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise grade databases such as Oracle



[ Databases in AWS : RDS(Relational Database Service) ]

1. Managed PostgreSQL / MySQL / Oracle / SQL Server

2. Must provision an EC2 instance & EBS Volume type and size

3. Support for Read Replicas and Multi AZ

4. Security through IAM, Security Groups, KMS, SSL in transit

5. Backup / Snapshop / Point in time restore feature

6. Managed and Scheduled maintenance

7. Monitoring through CloudWatch

8. Use case : Store relational datasets (RDBMS/OLTP), perform SQL queries, transactional I/U/D

※ OLTP : On-line Transactional Processing


[ RDS for Solutions Architect ]

1. Operations : small downtime when failover happens, when maintenance happens, scaling in read replicas/ec2 instance/restore EBS implies manual intervention, application changes

2. Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

3. Reliability : Multi AZ feature, failover in case of failures

4. Performance : depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Storage auto-scaling & manual scaling of instances

5. Cost : Pay per hour based on provisioned EC2 and EBS




[ Choosing the right database ]

- We have a lot of managed databases on AWS to choose from

- Questions to choose the right database based on your architecture :

1) Read-heavy, write-heavy, or balanced workload? Thoughput needs? Will it change, does it need to scale or fluctuate during the day?

2) How much data to store and for how long? Will it grow? Average object size? How are they accessed?

3) Data durability? Source of truth for the data?

4) Latency requirements ? Concurrent users?

5) Data model? How will you query the data? joins? Structured? Semi-Structured?

6) Strong schema? More flexibility? Reporting? Search? RBDMS/NoSQL?

7) License costs? Switch to Cloud Native DB such as Aurora?


[ 1. Database Types ]

1. RDBMS(=SQL/OLTP) : RDS, Aurora - great for joins

2. NoSQL database : DynamoDB (~JSON), ElastiCache (key/value pairs), Neptune(graphs) - no joins, no SQL

3. Object Store : S3 (for big objects) / Glacier (for backups / archives)

4. Data Warehouse (=SQL Analytics/BI) : Redshift (OLAP), Athena

5. Search : ElasticSearch(JSON) - free text, unstructured searches

6. Graphs : Neptune - displays relationships between data




[ Big Data Ingestion Pipeline : Todo List ] 

1. We want the ingestion pipeline to be fully serverless

2. We want to collect data in real time

3. We want to transform the data

4. We want to query the transformed data using SQL

5. The reports created using the queries should be in S3

6. We want to load that data into a warehouse and create dashboards


[ Big Data Ingestion Pipeline ]

- IoT Core allows you to harvest data from IoT devices

- Kinesis is great for real-time data collection

- Firehose helps with data delivery to S3 in near real-time (1 min)

- Lambda can help Firehose with data transformations

- Amazon S3 can trigger notifications to SQS

- Lambda can subscribe to SQS (we could have connecter S3 to Lambda)

- Athena is a serverless SQL service and results are stored in S3

- The reporting bucket contains analyzed data and can be used by reporting tool such as AWS QuickSight, Redshift, etc...


- We have an application running on EC2, that distributes software updates once in a while

- When a new software update is out, we get a lot of request and the content is distributed in mass over the network. It's very costly

- We don't want to change our application, but want to optimize our cost and CPU, how can we do it?


[ Our application current state ]

ELB + ASG , running on multi AZ


[ Easy way to fix things : Using Amazon CloudFront ]

Why CloudFront?

- No changes to architecture

- Will cache software update files at the edge

- Software update files are not dynamic, they're static (never changing)

- Our EC2 instances aren't serverless

- But CloudFront is, and will scale for us

- Our ASG will not scale as much, and we'll save tremendously in EC2

- We'll also save in availability, network bandwidth cost, etc

- Easy way to make an existing applicaition more scalable and cheaper


[ Distributing paid content ]

1. We sell videos online and users have to paid to buy videos

2. Each videos can bought by many different customers

3. We only want to distribute videos to users who are premium users

4. We have a database of premium users

5. Links we send to premium users should be short lived

6. Our application is global

7. We want to be fully serverless


[ Start simple, premium user service ]

Cognito 를 사용하여 인증(authentication)

DB 조회를 통해 유저가 프리미엄 유저인지 확인(인가(authorization))


[ Add Videos Storage Secure, Distribute Globally and Secure, Distribute Content only to premium users ]

1) 영상 URL 요청

2) Cognito 를 통해 인증

3) Lambda 를 통해 프리미엄 유저인지 확인(인가)

4) 프리미엄 유저인 경우 유효시간이 정해져있는 CroudFront Signed URL 을 생성 (CloudFront 는 Signed URL을 통해서만 접근이 가능토록 설정)

5) Signed URL 리턴 

6) 유저는 Signed URL 을 통해 CloudFront 에 접속 및 S3 영상 자원 열람


[ Premium User Video service ]

We have implemented a fully serverless solution :

1. Cognito for authentication

2. DynamoDB for storing users that are premium

3. 2 serverless applications

4. Content is stored in S3 (serverless and scalable)

5. Integrated with CloudFront with OAI for security (users can't bypass)

6. CloudFront can only be used using Signed URLs to prevent unauthorized users

※ What about S3 Signed URL? They're not efficeint for global access



[ Micro Service Architecture ]

- We want to switch to a micro service architecture

- Many services interact with each other directly using a REST API

- Each architecture for each micro service may vary in form and shape

- We want a micro-service architecture so we can have a leaner development lifecycle for each service


[ Micro Services Environment ]

route 53 에 DNS Query 하여 해당 도메인의 서버에 접근

각각의 도메인은 작게 쪼개진 각각의 서비스(Micro Service)로 내부적으로 다른 서비스를 호출하여 동작할 수 있음


[ Discussions on Micro Services ]

1) Free to design each micro-service the way we want

2) Synchronous patterns : API Gateway, Load Balancers

3) Asynchronous patterns : SQS, Kinesis, SNS, Lambda triggers (S3)

4) Challenges with micro-services:

 - repeated overhead for creating each new microservice

 - issues with optimizing server density/utilization

 - complexity of running multiple versions of multiple microservices simultaneously

 - proliferation(급증) of client-side code requirements to integrate with many separate services


Some of the challengs are solved by Serverless patterns :

- API Gateway, Lambda scale automatically and you pay per usage

- You can easily clone API, reproduce environments

- Generated client SDK through Swagger integration for the API Gateway



+ Recent posts