'infra & cloud/AWS' 카테고리의 글 목록 (5 Page)

[ 1. Amazon MQ ]

- Amazon MQ = managed Apache ActiveMQ

- SQS, SNS are cloud-native services and they're using proprietary protocols from AWS

- Traditional applications running from on-premise may use open protocols such as : MQTT, MAQP, STOMP, Openwire, WSS

- when migrating to the cloud, instead of re-engineering the application to use SQS and SNS, we can use Amazon MQ

- Amazon MQ doesn't "scale" as much as SQS/SNS

- Amazon MQ runs on dedicated machine, can run in HA(High Availability) with failover

- Amazon MQ has both queue feature(~SQS) and topic features(~SNS)

- active/standby structure (failover)

[ 2. SQS vs SNS vs Kinesis ]

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 14. Serverless : Lambda (0)	2021.09.01
[AWS] 13. Docker, ECS/Fargate/EKS (0)	2021.04.25
[AWS] 12-3. Kinesis Data Streams (0)	2021.04.18
[AWS] 12-2. Decoupling application: SNS, SNS+SQS (Fan Out) (0)	2021.04.14
[AWS] 12-1. Decoupling application: SQS (0)	2021.04.13

[ Kinesis ]

Apache Kafka 를 대체, 실시간 big data 관련 작업에 유리

- Kinesis is a managed alternative to Apache Kafka

- Great for application logs, metrics, Iot, clickstreams

- Great for "real-time" big data

- Grate for streaming processing frameworks (Spark, NiFi, etc...)

- Data is automatically replicated to 3AZ

Kinesis Streams : low latency streaming ingest at scale

Kinesis Analytics : perform real-time analytics on streams using SQL

Kinesis Firehose : load stream into S3, Redshift, ElasticSearch..

[ 1. Kinesis Streams ]

Shard 로 stream 이 나뉘어짐, 데이터는 default 로 하루간 보관됨, 데이터에 대한 재처리 가능, 데이터가 kinesis 에 한 번 삽입되면 지워지지 않음

- Streams are divided in ordered Shards/Partitions

- Data retention is 1 day by default, can go up to 365 days

- Ability to reprocess/replay data

- Multiple applications can consume the same stream

- Real-time processing with scale of throughput

- Once data is inserted in Kinesis, it can't be deleted (immutability)

[ Kinesis Streams Shards ]

SHARD 하나당 1 MB/s 쓰기 , 2 MB/s 읽기

SHARD 갯수 대로 과금, SHARD 갯수는 reshard(추가), merge(병합) 을 통해 증/감 가능

- One stream is made of many different shards

- 1 MB/s or 1000 messages/s at write per SHARD

- 2 MB/s at read per SHARD

- Billing is per shard provisioned, can have as many shards as you want

- Batching available or per message calls

- The number of shards can evolve over time (reshard/merge)

- Records are ordered per shard

[ AWS Kinesis API - Put records ]

partition key 를 사용할 경우 동일한 키는 동일한 파티션으로 보내짐, 처리량을 높이고싶을 경우 PutRecords 와 batching 을 사용

- PutRecord API + Partition key that gets hashed

- The same key goes to the same partition (helps with ordering for a specific key)

- Messages sent get a "sequence number"

- Choose a partition key that is highly distributed (helps prevent "hot partition")

user_id if many users

Not country_id if 90% of the users are in one country

- Use Batching with PutRecords to reduce costs and increase throughput

- ProvisionedThroughputExceeded Exception occurs if we go over the limits

[ AWS Kinesis API - Exceptions ]

ProvisionedThroughputExceeded Exceptions

- Happens when sending more data (exceeding MB/s or TPS for any shard)

- Make sure you don't have a hot shard (such as your partition key is bad and too much data goes to that partition)

* Solution : Retries with backoff / Increase shards (scaling)

[ AWS Kinesis API - Consumers ]

- Can use a normal consumer : CLI, SDK, etc...

- Can use Kinesis Client Library (in Java, Node, Python, Ruby, .NET)

: KCL uses DynamoDB to checkpoint offsets

: KCL uses DynamoDB to track other workers and share the work amongst shards

[ Kinesis Security ]

Control access / authorization using IAM policies

Encryption in flight using HTTPS endpoints

Encryption at rest using KMS

Possibility to encrypt/decrypt data client side(harder)

VPC Endpoints available for Kinesis to access within VPC

[ 2. Kinesis Data Firehose ]

서버리스, 자동 스케일링, 관리가 필요없음

실시간에 가까움(실시간이 아님)

- Fully Managed Service, no administration, automatic scaling, serverless

- Load data into Redshift/Amazon S3/ElasticSearch/Splunk

- Near Real Time

60 seconds latency minimum for non full batches

Or minium 32 MB of data at a time

- Supports many data formats, conversions, trasformations, compression

- Pay for the amount of data going through Firehose

[ Kinesis Data Streams vs Firehose ]

# Streams

- Going to write custom code (producer/consumer)

- Real time(~200ms)

- Must manage scaling(shard splitting/merging)

- Data Storage for 1 to 7 days, replay capability, multi consumers

# Firehose

- Fully managed, send to S3, Splunk, Redshift, ElasticSearch

- Serverless data transformations with Lambda

- Near real time (lowest buffer time is 1 minute)

- Automated Scaling

- No data storage

[ Kinesis Data Analytics ]

- Perform real-time analytics on Kinesis Streams using SQL

- Kinesis Data Analytics :

Auto Scaling

Managed : no servers to provision

Continuous : real time

- Pay for actual consumption rate

- Can create streams out of the real-time queries

[ Data ordering for Kinesis vs SQS FIFO ]

각각의 객체의 순서를 지키며 데이터를 사용하고자 할 경우 객체별 partition key 를 사용, 키는 항상 동일한 shard 로 보내짐

to consume the data in order for each object, send using a "partition key" value of the "object_id"

the same key will always go to the same shard

[ Ordering data into SQS ]

SQS Standard 는 순차 처리가 아님. FIFO를 사용하며 다수의 consumer가 존재할 경우 GROUP ID 를 사용하여 메시지를 그루핑 할 수 있음 (Kinesis 의 partition key 와 비슷)

# Standard Case

- For SQS standard, there is no ordering

- For SQS FIFO, if you don't use a Group ID, messages are consumed in the order they are sent, with only one consumer

# When to use Group ID

- You want to sacle the number of consumers, but you want messages to be "grouped" when they are related to each other

- Then you use a Group ID (similar to Partition key in Kinesis)

[ # Kinesis vs SQS ordering ]

Let's assume 100 trucks, 5 kinesis shards, 1 SQS FIFO

# Kinesis Data Streams :

- On average you will have 20 trucks per shard

- Trucks will have their data ordered within each shard

- The maximum amount of consumers in parallel we can have is 5

- Can receive up to 5 MB/s of data

# SQS FIFO :

- you only have one SQS FIFO queue

- you will have 100 Group ID

- You can have up to 100 consumers (due to the 100 Group ID)

- You have up to 300 messages per second (or 3000 if using batching)

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 13. Docker, ECS/Fargate/EKS (0)	2021.04.25
[AWS] 12-4. Amazon MQ, SQS vs SNS vs Kinesis (0)	2021.04.25
[AWS] 12-2. Decoupling application: SNS, SNS+SQS (Fan Out) (0)	2021.04.14
[AWS] 12-1. Decoupling application: SQS (0)	2021.04.13
[AWS] 11-2. Hybrid Cloud for Storage : AWS Storage Gateway, FSx for Windows/Lustre (0)	2021.04.12

[ Amazon SNS ]

send one message to many receivers

- The "event producer" only sends message to one SNS topic

- As many "event receivers(subscriptions)" as we want to listen to the SNS topic notifications

- Each subscriber to the topic will get all the messages (note: new feature to filter messages)

- Up to 10,000,000 subscriptions per topic

- 100,000 topics limit

- Subscribers can be

1) SQS

2) HTTP/HTTPS (with delivery retries - how many times)

3) Lambda

4) Emails

5) SMS messages

6) Mobile Notifications

1. SNS integrated with a lot of AWS services

- Many AWS services can send data directly to SNS for notifications

- CloudWatch (for alarms)

- Auto Scaling Groups notifications

- Amazon S3 (on bucket events)

- CloudFormation (upon state changes => failed to build, etc)

2. How to publish

- Topic publish (using the SDK)

1) Create a topic

2) Create a subscription

3) Publish to the topic

- Direct Publish (for mobile apps SDK)

1) Create a platform application

2) Create a platform endpoint

3) Publish to the platform endpoint

4) Works with Google GCM, Apple APNS, Amazon ADM

3. Security

- Encryption :

In-flight encryption using HTTPS API

At-rest encryption using KMS keys

Client-side encryption if the client wants to perform encrption/decryption inself

- Access Controls : IAM policies to regulate access to the SNS API

- SNS Access Policies (similiar to S3 bucket policies)

Useful for cross-account access to SNS topics

Useful for allowing other services (S3..) to write to an SNS topic

[ SNS + SQS : Fan Out(산개) ]

SNS에 메시지 푸시후 토픽을 구독중인 다수의 SQS가 메시지를 가져가는 패턴

- Push once in SNS, receive in all SQS queues that are subscribers

- Fully decoupled, no data loss

- SQS allows for : data persistence, delayed processing and retries of work

- Ability to add more SQS subscribers over time

- Make sure your SQS queue access policy allows for SNS to write

* SNS cannot send messages to SQS FIFO queues (AWS limitation)

# S3 Events to multiple queues

- For the same combination of : event type(eg: object create) and prefix (eg: images/) you can only have one S3 Event rule

- If you want to send the same S3 event to many SQS queues, use fan-out

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-4. Amazon MQ, SQS vs SNS vs Kinesis (0)	2021.04.25
[AWS] 12-3. Kinesis Data Streams (0)	2021.04.18
[AWS] 12-1. Decoupling application: SQS (0)	2021.04.13
[AWS] 11-2. Hybrid Cloud for Storage : AWS Storage Gateway, FSx for Windows/Lustre (0)	2021.04.12
[AWS] 11. AWS Storage Extras : Snowball (0)	2021.04.12

- When we start deploying multiple applications, they will inevitably need to communicate with one another

- There are two patterns of application communication

1. Synchronous communications (application to application)

2. Asynchronous / Event based (application to queue to application)

Synchronous between applications can be problematic if there are sudden spikes of traffic

What if you need to suddenly encode 1000 videos but usually it's 10?

In that case, it's better to decouple your applications

1) using SQS: queue model

2) using SNS: pub/sub model

3) using Kinesis: real-time streaming model

These services can scale independently from our application

[ 1. SQS (Simple Queuing service) ]

- Oldest offering (over 10 years old)

- Fully managed service, used to decouple applications

- Attributes :

1) unlimited throughput, unlimited number of messages in queue

2) Short-lived : Default retention of messages in Queue for 4days, maximum of 14days

3) Low latency (<10ms on publish and receive)

4) Limitation of 256KB per message sent

- Can have duplicate message (at least one delivery, occasionally)

- Can have out of order messages (best effort ordering)

1) SQS : Producing Messages

- Produced to SQS using the SDK (SendMessage API)

- The message is persisted in SQS until a consumer deletes it

- Message retention: default 4 days, up to 14 days

- unlimited throughput

2) SQS : Consuming Messages

- Consumers (running on EC2 instances, servers, or AWS Lambda)

- Poll SQS for messages (receive up to 10 messages at a time)

- Process the messages (ex: insert the message into an RDS database)

- Delete the messages using the DeleteMessage API

3) Multiple EC2 Instances Consumers

- Consumers receive and process messages in parallel

- At least once delivery

- Best-effort(최선을 다 하지만 책임은 지지않는) message ordering

- Consumers delete messages after processing them

- We can scale consumers horizontally to improve throughput of processing

4) SQS with Auto Scaling Group (ASG)

queue length go over a certain level, set CloudWatch Alarm, then increase the capacity of Auto Scaling Group

5) SQS to decouple between application tiers

오래걸리는 작업은 두개의 프로세스로 쪼개어 SQS Queue 를 사용하여 Decoupling

6) SQS -Security

- Encryption

1) In-flight encryption using HTTPS API

2) At-rest encryption using KMS keys

3) Client-side encryption if the client wants to perform encryption/decryption itself

- Access Controls : IAM policies to regulate access to the SQS API

- SQS Access Policies (similiar to S3 bucket policies)

Useful for cross-account access to SQS queues

Useful for allowing other services (SNS, S3...) to write to an SQS queue

7) Message Visibility Timeout **

특정 서버가 message 를 poll해가면, default 값인 30 초 동안 다른 서버는 해당 message 에 접근 할 수 없음

- After a message is polled by a consumer, it becomes invisible to other consumers

- By default, the message visibility timeout is 30 seconds

- That means the message has 30 seconds to be processed

- After the message visibility timeout is over, the message is "visible" in SQS

- If a message is not processed within the visibility timeout, it will be processed twice

- A consumer could call the ChangeMessageVisibility API to get more time

- If visibility timeout is high (hours), and consumer crashes, re-processing will take time

- If visibility timeout is too short, we may get duplicates

8) Dead Letter Queues

- If a consumer fails to process a message within the Visibility Timeout, the message goes back to the queue

- We can set a threshold(문턱(리밑)) of how many times a message can go back to the queue

- After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue(DLQ)

* Useful for debugging

* make sure to process the messages in the DLQ before they expire: Good to set a retention of 14 days in the DLQ

9) Delay Queue

- Delay a message (consumers don't see it immediately) up to 15 mins

- Default is 0 seconds (message is available right away)

- Can set a default at queue level

- Can override the default on send using the DelaySeconds parameter

10) SQS - FIFO Queue

선입선출 방식의 큐로 초당 전송량 제한이 없는 SQS와 달리, 전송량에 한계가 있음

- First In First Out (ordering of messages in the queue)

- Limited throughput : 300 msg/s without batching, 3000 msg/s with

- Exactly-once send capability (by removing duplicates)

- Messages are processed in order by the consumer

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-3. Kinesis Data Streams (0)	2021.04.18
[AWS] 12-2. Decoupling application: SNS, SNS+SQS (Fan Out) (0)	2021.04.14
[AWS] 11-2. Hybrid Cloud for Storage : AWS Storage Gateway, FSx for Windows/Lustre (0)	2021.04.12
[AWS] 11. AWS Storage Extras : Snowball (0)	2021.04.12
[AWS] 10-2. CloudFront Signed URL / Cookies, Global Accelerator (0)	2021.04.11

[ Hybrid Cloud for Storage ]

AWS S3 와 on-premise 스토리지를 함께 사용하는 방식, AWS Storage를 사용하여 on-premise와 AWS Storage간 연동이 가능

- AWS is pushing for "hybrid cloud"

Part of your infrastructure is on the cloud

Part of your infrastructure is on-premise(실제보유)

- This can be due to

1) Long cloud migrations

2) Security requirements

3) Compliance requirements

4) IT strategy

- S3 is a proprietary storage technology (unlike EFS/NFS), sho how do you expose the S3 data on-premise?

: AWS Storage Gateway

[ AWS Storage Gateway ]

Bridge between on-premise data and cloud data in S3

ex) DR, backup & restore, tiered storage

1. File Gateway

- Configured S3 buckets are accessible using the NFS and SMB protocol

- Supports S3 standard, S3 IA, S3 One Zone IA

- Bucket access using IAM roles for each File Gateway

- Most recently used data is cached in the file gateway

- can be mounted on many servers

- backed by S3

1-2. File Gateway - Hardware appliance

- Using a file gateway means you need virtualization capability

Otherwise, you can use a File Gateway Hardware Appliance

- You can buy it on amazon.com

- helpful for daily NFS backups in small data centers

2. Volume Gateway

- Block storage using iSCSI protocol backed by S3

- Backed by EBS snapshots which can help restore on-premise volumes

Cached volumes: low latency access to most recent data

Stored volumes: entire dataset is on premise, scheduled backups to S3

- backed by S3 with EBS snapshots

3. Tape Gateway

- Some companies have backup processes using physical tapes

- with tape gateway, companies use the same processes but in the cloud

- Virtual Tape Library (VTL) backed by Amazon S3 and Glacier

- Back up data using existing tape-based processes (and iSCSI interface)

- Works with leading backup software vendors

- backed by S3 and Glacier

[ Amazon FSx for Windows ]

Linux 에서만 사용가능한 EFS 를 보완하기 위해 나온 Window 용 EFS

- EFS is a shared POSIX system for Linux systems

- FSx for Windows is a fully managed Windows file system share drive

- Supports SMB protocol & Windows NTFS

- Microsoft Active Directory integration, ACLs, user quotas

- Built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data

- Can be accessed from your on-premise infrastructure

- Can be configured to be Multi-AZ

- Data is backed-up daily to S3

[ Amazon FSx for Lustre ]

Clustering 된 Linux. 분산 파일 시스템, 머신러닝 등 높은 퍼포먼스 지원

- The name Lustre is derived from "Linux" and "cluster"

- Lustre is a type of perallel distributed file system, for large-scale computing

- Machine Learning, High Performance Computing (HPC)

- Video Processing, Financial Modeling, Electronic Design Automation

- Scales up to 100s GB/s, millions of IOPS, sub-ms latencies

- Seamless integration with S3

Can "read S3" as a file system (through FSx)

Can write the output of the computations back to S3 (through FSx)

- Can be used from on-premise servers

# Storage Comparsion

- S3 : Object Storage

- Glacier : Object Archival

- EFS : Network File System for many Linux instances, POSIX filesystem

- EBS Volumes : Network storage for one EC2 instance at a time

- FSx for Windows : Network File System for Windows servers

- FSx for Lustre : High Performance Computing Linux file system

- Instance Storage : Physical storage for your EC2 instance (high IOPS)

- Storage Gateway : File Gateway, Volume Gateway (cache & stored), Tape Gateway

- Snowball / Snowmobile : to move large amount of data to the cloud, physically

- Database : for specific workloads, usaully with indexing and querying

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-2. Decoupling application: SNS, SNS+SQS (Fan Out) (0)	2021.04.14
[AWS] 12-1. Decoupling application: SQS (0)	2021.04.13
[AWS] 11. AWS Storage Extras : Snowball (0)	2021.04.12
[AWS] 10-2. CloudFront Signed URL / Cookies, Global Accelerator (0)	2021.04.11
[AWS] 10-1. AWS CloudFront (0)	2021.04.11

[ Snowball ]

- Physical data transport solution that helps moving TBs or PBs of data in or out of AWS

- Alternative to moving data over the network (and paying network fees)

- Secure, tamper resistant, uses KMS 256 bit encryption

- Tracking using SNS and text messages, E-ink shipping label

- Pay per data transfer job

ex) large data cloud migrations, DC decommission, DR

If it takes more than a week to transfer over the network, use Snowball devices

[ Snowball : process ]

1. Request snowball devices from the AWS console for delivery

2. Install the snowball client on your servers

3. Connect the snowball to your servers and copy files using the client

4. Ship back the device when you're done (goes to the right AWS facility)

5. Data will be loaded into an S3 bucket

6. Snowball is completely wiped

7. Tracking is done using SNS, text messages and the AWS console

[ Snowball Edge ]

- Snowball Edges add computational capability to the device

- 100TB capacity with either :

1) Storage optimized - 24vCPU

2) Compute optimized - 52 vCPU & optional GPU

- Supports a custom EC2 AMI so you can perform processing on the go

- Supports custom Lambda functions

- Very useful to pre-process the data while moving

ex) data migration, image collation, IoT capture machine learning

[ Snowmobile ]

- Transfer exabytes of data (1 EB = 1000 PB = 1000000 TBs)

- Each Snowmobile has 100 PB of capacity (use multiple in parallel)

- Better than Snowball if you transfer more than 10 PB

[ Snowball into Glacier ]

스노우볼 데이터를 Glacier 로 바로 옮길 수 없으며 S3에 올린 후 lifecycle 정책에 의해 Glacier 로 이동되게 해야함

- Snowball cannot import to Glacier directly

- You have to use Amazon S3 first, and an S3 lifecycle policy

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-1. Decoupling application: SQS (0)	2021.04.13
[AWS] 11-2. Hybrid Cloud for Storage : AWS Storage Gateway, FSx for Windows/Lustre (0)	2021.04.12
[AWS] 10-2. CloudFront Signed URL / Cookies, Global Accelerator (0)	2021.04.11
[AWS] 10-1. AWS CloudFront (0)	2021.04.11
[AWS] 9-4. S3 Performance (0)	2021.04.10

[ CloudFront Signed URL / Signed Cookies ]

- You want to distribute paid shared content to premium users over the world

- We can use CloudFront Signed URL/Cookie. We attach a policy with :

1) includes URL expiration

2) includes IP ranges to acecss the data from

3) trusted signers (which AWS accounts can create signed URLs)

- How long should the URL be valid for?

-- Shared content (movie, music) : make it short (a few minutes)

-- Private content (private to the user) : you can make it last for years

* Signed URL = access to individual files (one signed URL per file)

* Signed Cookies = access to multiple files (one signed cookie for many files)

1. Client 는 application 에 인증(authentication)

2. App은 AWS SDK 를 사용하여 Signed URL 을 생성, Client 에 리턴

3. Client 는 Signed URL 을 통해 CloudFront -> S3 Object 에 접근

[ CloudFront Signed URL vs S3 Pre-Signed URL ]

CloudFront Signed URL 은 S3 에 CloudFront Edge 를 통해 접근

S3 Pre-Signed URL 은 S3 에 직접 접근 (IAM 사용)

1. CloudFront Signed URL

- Allow access to a path, no matter the origin

- Account wide key-pair, only the root can manage it

- Can filter by IP, path, date, expiration

- Can leverage caching features

2. S3 Pre-Signed URL

- Issue a request as the person who pre-signed the URL

- Uses the IAM key of the signing IAM principal

- Limited lifetime

[ AWS Global Accelerator ]

[ Global users for our application ]

Global 서비스에 public internet을 사용하여 접속하는 client 들은 수많은 hop 을 거치며 app에 도달하므로 지연 발생

- You have deployed an application and have global users who want to access it directly

- They go over the public internet, which can add a lot of latency due to many hops

- We wish to go as fast as possible through AWS network to minimize latency

# Unicast IP vs AnyCast IP

Anycast IP는 모든 서버가 동일한 IP주소를 사용하며 클라이언트는 가장 가까운 곳에 routing 되는 방식

Unicast IP : one server holds one IP address

Anycast IP : all servers hold the same IP address and the client is routed to the nearest one

[ AWS Global Accelerator ]

client는 public internet 대신 edge location을 통하여 AWS internal network 로 app에 접근

- Leverage the AWS internal network to route to your application

- 2 Anycast IP are created for your application

- The Anycast IP send traffic directly to Edge Locations

- The Edge locations send the traffic to your application

- Works with Elastic IP, EC2 instances, ALB, NLB, public or private

- Consistent Performance

1) Intelligent routing to lowest latency and fast regional failover

2) No issue with client cache (because the IP doesn't change)

3) Internal AWS network

- Health Checks

1) Global accelerator performsa health check of your applications

2) Helps make your application global (failover less then 1 minute for unhealthy)

3) Grate for DR

- Security

1) only 2 external IP need to be whitelisted

2) DDoS protection thanks to AWS Shield

[ AWS Global Accelerator vs CloudFront ]

Both :

1) use the AWS global network and its edge locations around the world

2) integrate with AWS Shield for DDoS protection

Differences :

CloudFront

- Improves performance for both cacheable content (ex: images and videos)

- Dynamic content (ex: API acceleration and dynamic site delivery)

- Content is served at the edge

Global Accelerator

- Improves performance for a wide range fo applications over TCP or UDP

- Proxying packets at the edge to applications running in one or more AWS Regions

- Good fit for non-HTTP use cases, such as gaming(UDP), IoT(MQTT), or Voice over IP

- Good for HTTP use cases that require static IP addresses

- Good for HTTP use cases that required deterministic, fast regional failover

# Hands-On : Global Accelerator

1. Endpoint 로 지정할 Instance 복수개 생성

2. Global accelerator 생성

1) endpoint groups 지정 - region 지정

2) region 별 instance 지정(1에서 생성한 instance 지정)

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 11-2. Hybrid Cloud for Storage : AWS Storage Gateway, FSx for Windows/Lustre (0)	2021.04.12
[AWS] 11. AWS Storage Extras : Snowball (0)	2021.04.12
[AWS] 10-1. AWS CloudFront (0)	2021.04.11
[AWS] 9-4. S3 Performance (0)	2021.04.10
[AWS] 9-3. Storage Classes + Glacier (0)	2021.04.06

[ AWS CloudFront ]

한국 유저가 호주 S3 bucket 의 컨텐츠 요청시 한국에서 가까운 edge(eg. 도쿄) 에서 cached 된 데이터를 가져옴

- Content Delivery Network (CDN)

- Improves read performance, content is cached at the edge

- 216 Point of Presence globally (edge locations)

- DDos protection, integration with Shield, AWS Web application firewall

- can expose external HTTPS and can talk to internal HTTPS backends

[ CloudFront - Origins ]

S3 bucket / Custom origin 에 CloudFront 만 접속/접근을 허용하게 설정(OAI)하여 보안성 향상

1. S3 bucket

- For distributing files and caching them at the edge

- Enhanced security with CloudFront Origin Access Identity (OAI)

- CloudFront can be used as an ingress (to upload files to S3)

2. Custom Origin (HTTP)

- Application Load Balancer

- EC2 instance

- S3 website (must first enable the bucket as a static S3 website)

- Any HTTP backend you want

# CloudFront at a high level

# CloudFront - S3 as an Origin

# CloudFront - ALB or EC2 as an origin

[ CloudFront Geo Restriction ]

- You can restrict who can access your distribution

- can use Whitelist/Blacklist

- The country is determined using a 3rd party Geo-IP database

ex. Copyright Laws to control access to content

[ CloudFront vs S3 Cross Region Replication ]

1) CloudFront :

- Global Edge network

- Files are cached for a TTL (maybe a day)

- Great for static content that must be available everywhere

2) S3 Cross Region Replication :

- Must be setup for each region you want replication to happen

- Files are updated in near real-time

- Read only

- Great for dynamic content that needs to be available at low-latency in few regions

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 11. AWS Storage Extras : Snowball (0)	2021.04.12
[AWS] 10-2. CloudFront Signed URL / Cookies, Global Accelerator (0)	2021.04.11
[AWS] 9-4. S3 Performance (0)	2021.04.10
[AWS] 9-3. Storage Classes + Glacier (0)	2021.04.06
[AWS] 9-2. S3 Access Logs, S3 Replication (0)	2021.04.04

[ S3 Performance ]

오토스케일링이 되며, request 는 prefix 마다 받을 수 있는 양이 있으므로 prefix 를 늘려 성능을 향상시킬 수 있음

- Amazon S3 automatically scales to high request rates, latency 100-200ms

- Your application can achieve at lease 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix in a bucket

- There are no limits to the number of prefixes in a bucket

- prefix Example (object path -> prefix) :

1) bucket/forder1/sub1/file -> prefix : /folder1/sub1/

2) bucket/forder1/sub2/file -> prefix : /folder1/sub2/

3) bucket/1/file -> prefix : /2/

4) bucket/2/file -> prefix : /2/

* If you spread reads across all four prefixes evenly, you can achieve 22000 requests per second for GET and HEAD

[ S3 KMS Limitation ]

SSE-KMS encryption 사용시 KMS 암복호화로 인해 성능에 문제가 생길 수 있음

- If you use SSE-KMS, you may be impacted(영향받는) by the KMS limits

- When you upload, it calls the GenerateDataKey KMS API

- When you download, it calls the Decrypt KMS API

- Count towards the KMS quota per second (5500, 10000, 30000 req/s based on region)

- As of today, you cannot request a quota increase for KMS

[ S3 Performance ]

1 UPLOAD

1) Multi-part upload

- recommended for files > 100MB

- must use for files > 5GB

- Can help parallelize uploads (speed up transfers)

2) S3 Transfer Acceleration (upload only)

- Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region

- Compatible with multi-part upload

2 DOWNLOAD :

1) S3 Byte-Range Fetches

- Paralleize GETs by requesting specific byte ranges

- Better resilience in case of failures

- Can be used to speed up downloads

- Can be used to retrieve only partial data (for example the head of a file)

[ S3 Select & Glacier Seletct ]

S3 서버사이드 필터링으로 고성능

- Retrieve less data using SQL by performing server side filtering

- Can filter by rows & columns (simple SQL statements)

- Less network transfer, less CPU cost client-side

[ S3 Event Notifications ]

S3 이벤트 발생시 SNS, SQS, Lambda function 등의 노티를 받을 수 있음

bucket versioning 을 활성화 시켜야 함

- ObjectCreated, ObjectRemoved, ObjectRestore, Replication...

- Object name filtering possible (ex: *.jpg)

ex: generate thumbnails of images uploaded to S3

- Can create as many "S3 events" as desired

- can email/notification, add message into a queue, call Lambda Functions to generate some custom code

- S3 event notifications typically deliver events in seconds but can sometimes take a minute or longer

- If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent

- If you want to ensure that an event notification is sent for every successful write, you should enable versioning on you bucket

[ AWS Athena ]

S3 Bucket 에 file 을 두고 sql 로 직접 조회/분석이 가능

Serverless service to perform analytics directly against S3 files

- Uses SQL language to query the files

- Has a JDBC/ODBC driver

- Charged per query and amount of data scanned

- Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)

Use cases: Business intelligence/analytics/reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails...

* to Analyze data directly on S3, use Athena

[ S3 Object Lock & Glacier Vault Lock ]

1) S3 Object Lock : 정해진 시간동안 LOCK

Adopt a WORM (Write Once Read Many) model

Block an object version deletion for a specified amount of time

2) Glacier Vault Lock : 한번 설정시 파일 수정/삭제 절대불가

Adopt a WORM model

Lock the policy for future edits (can no longer be changed)

Helpful for compliance and data retention(보유)

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 10-2. CloudFront Signed URL / Cookies, Global Accelerator (0)	2021.04.11
[AWS] 10-1. AWS CloudFront (0)	2021.04.11
[AWS] 9-3. Storage Classes + Glacier (0)	2021.04.06
[AWS] 9-2. S3 Access Logs, S3 Replication (0)	2021.04.04
[AWS] 9-1. S3 MFA Delete (0)	2021.04.03

[ S3 Storage Classes ]

1. Amazon S3 Standard (General Purpose)

- High durability of objects across multiple AZs

- Sustain 2 concurrent facility failures

- eg. Big Data analytics, mobile&gaming applications, content distribution

2. Amazon S3 Intelligent Tiering

저지연, 고성능, 모니터링 비용 부과

- High durability of objects across multiple AZs

- Same low latency and high throughput performance of S3 Standard

- Small monthly monitoring and auto-tiering fee

- Automatically moves objects between two access tiers based on changing access patterns

- Resilient against events that impact an entire AZ

3. Amazon S3 Standard-IA (Infrequent Access)

자주 접근하지 않는 데이터에 적합, 고성능, 저렴

- High durability of objects across multiple AZs

- Suitable for data that is less frequently accessed, but requires rapid access when needed

- Low cost compared to Amazon S3 Standard

- Sustain 2 concurrent facility failures

- eg. As a data store for DR, backups

4. Amazon S3 One Zone-IA (Infrequent Access)

Standard-IA 보다 고성능, 더 저렴, 단일 AZ로 DR 불가

- Same as IA but data is stored in a single AZ

- data lost when AZ is destroyed

- Low latency and high throughput performance

- Supports SSL for data at transit and encryption at rest

- Low cost compared to IA (20%)

- eg. Storing secondary backup copies of on-premise data, or storing data you can recreate (thumnail)

5. Amazon Glacier

저장직후 단시간동안 접근이 불가하며 오래동안 데이터를 보관하기 용이, 저렴, 마그네틱 저장소의 대체재

- Low cost object storage meant for archiving/backup

- Data is retained for the longer term

- Alternative on-premise magnetic tape storage

- Cost per storage per month + retrieval cost

- Each item in Glacier is called "Archive" (not object)

- Archives are stored in "Vaults" (not bucket)

- 3 retrieval options :

1) Expedited (1 to 5 minutes)

2) Standard (3 to 5 hours)

3) Bulk (5 to 12 hours)

* Minimum storage duration of 90 days

6. Amazon Glacier Deep Archive

Amazon Glacier 보다 더 저렴, 저장직후 더오랜 시간 접근이 불가

- Amazon Glacier Deep Archive - for long term storage - cheaper :

1) Standard (12hours)

2) Bulk (48hours)

* Minimum storage duration of 180 days

~~7. Amazon S3 Reduced Redundancy Storage (deprecated/omitted)~~

# Moving between storage classes

- You can transition objects between storage classes

- For infrequently accessed object, move them to STANDARD_IA

- For archive objects you don't need in real-time, GLACIER or DEEP_ARCHIVE

- Moving objects can be automated using a lifecycle configuration

[ S3 Lifecycle Rules ]

- Transition actions : It defines when objects are transitioned to another storage class

1) Move objects to Standard IA class 60 days after creation

2) Move to Clacier for archiving after 6 months

- Expiration actions: configure objects to expire (delete) after some time

1) Access log files can be set to delete after a 365 days

2) Can be used to delete old versions of files (if versioning is enabled)

3) Can be used to delete incomplete multi-part uploads

- Rules can be created for a certain prefix (ex: s3://mybucket/mp3/*)

- Rules can be created for certain objects tags (ex: Department: Finance)

저작자표시 비영리

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 10-1. AWS CloudFront (0)	2021.04.11
[AWS] 9-4. S3 Performance (0)	2021.04.10
[AWS] 9-2. S3 Access Logs, S3 Replication (0)	2021.04.04
[AWS] 9-1. S3 MFA Delete (0)	2021.04.03
[AWS] 8. AWS CLI : configuration (0)	2021.04.01

DEVELOPyo

infra & cloud/AWS

[AWS] 12-4. Amazon MQ, SQS vs SNS vs Kinesis

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-3. Kinesis Data Streams

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-2. Decoupling application: SNS, SNS+SQS (Fan Out)

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 12-1. Decoupling application: SQS

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 11-2. Hybrid Cloud for Storage : AWS Storage Gateway, FSx for Windows/Lustre

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 11. AWS Storage Extras : Snowball

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 10-2. CloudFront Signed URL / Cookies, Global Accelerator

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 10-1. AWS CloudFront

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 9-4. S3 Performance

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 9-3. Storage Classes + Glacier

'infra & cloud > AWS' 카테고리의 다른 글

+ Recent posts

티스토리툴바