[ Databases in AWS : Redshift ]

PostgreSQL 기반이지만 OLTP(트랜잭션 프로세싱) 지원하지않음

로우기반이아닌 칼럼기반 데이터 저장

MPP(대규모 병렬 쿼리)를 사용하여 다른 데이터베이스에 비해 월등히 뛰어난 성능

AWS Quicksight/Tableau 등의 BI(Business Intelligence) 툴 제공

- Redshift is based on PostgreSQL, but it's not used for OLTP(Online Transaction Processing)

- It's OLAP(Online Analytical Processing) - online analytical processing (analytics and data warehousing)

- 10x better performance than other data warehouses, scale to PBs of data

- Columnar storage of data (instead of row based)

- Massively Parallel Query Execution (MPP) -> reason why it is such high performance

- Pay as you go based on the instances provisioned

- Has a SQL interface for performing the queries

- BI(Business Intelligence tools such as AWS Quicksight or Tableau integrate with it

- Data is loaded from S3, DynamoDB, DMS, other DBs

- From 1 node to 128 nodes, upto 128TB of space per node

   -- Leader node : for query planning, results aggregation

   -- Compute node : for performing the queries, send results to leader

- Redshift Spectrum : perform queries directly against S3 (no deed to load)

- Backup & Restore, Security VPC / IAM / KMS, Monitoring

- Redshift Enhanced VPC Routing : COPY / UNLOAD goes through VPC

 

[ Redshift - Snapshots & DR ]

- Redshift has no "Multi-AZ" mode

- Snapshots are point-in-time backups of a clust, stored internally in S3

- Snapshots are incremental (only what has changed is saved)

- You can restore a snapshot into a new cluster

  -- Automated : every 8 hours, every 5 GB, or on a schedule, Set retention

  -- Manual : snapshot is retained until you delete it

- You can figure Amazon Redshift to automatically copy snapshots (automated or manual) of a cluster to another AWS Region

DR(Disaster Recovery) plan : 스냅샷 자동생성 활성화, Redshift cluster 가 자동으로 스냅샷을 다른 AWS Region에 카피하도록 설정

the way of copy snapshots of cluster to another AWS Region

 

[ Loading data into Redshift ]

[ Redshift Spectrum ]

S3 의 데이터를 Redshift 테이블에 직접 넣지 않고(로딩하지 않고) 쿼리의 실행이 가능하도록 하는 기능

Redshift cluster 가 활성화 되어있어야 사용가능

- Query data that is already in S3 without loading it

- Must have a Redshift cluster available to start the query

- The query is then submitted to thousands of Redshift Spectrum nodes

 

[ Redshift for Solutions Architect ]

Operations : like RDS

Security : IAM, VPC, KMS, SSL (like RDS)

Reliability : auto healing features, cross-region snapshot copy

Performance : 10x performance vs other data warehousing, compression

Cost : pay per node provisioned, 1/10th of the cost vs other warehouses

vs Athena : faster queries / joins / aggregations thanks to indexes

※ Redshift = Analytics / BI / Data Warehouse

 

 

반응형

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 18-9. Databases in AWS : Neptune  (0) 2021.09.26
[AWS] 18-8. Databases in AWS : Glue  (0) 2021.09.26
[AWS] 18-6. Databases in AWS : Athena  (0) 2021.09.25
[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-5. Databases in AWS : DynamoDB  (0) 2021.09.25

[ Athena Overview ]

Database 는 아니지만 S3위에 query 엔진을 제공

- Fully Serverless database with SQL capabilities

- Used to query data in S3

- Pay per query

- Output results back to S3

- Secured through IAM

※ Use Case : one time SQL queries, serverless queries on S3, log analytics

 

[ Athena for Solutions Architect ]

Operations : no operations needed, serverless

Security : IAM + S3 security

Reliability : managed service, uses Presto engine, highly available

Performance : queries scale based on data size

Cost : pay per query / per TB of data scanned, serverless

 

 

 

반응형

[ Databases in AWS : S3 ]

- S3 is a key / value store for objects

- Great for big objects, not so great for small objects

- Serverless, sclaes infinitely, max object size is 5 TB

- Strong consistency

- Tiers : S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups

- Features : Versioning, Encryption, Cross Region Replication, etc...

- Security : IAM, Bucket Policies, ACL(Access Control Policy)

- Encryption : SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit

※ Use Case : static files, key value store for big files, website hosting

 

[ S3 for Solutions Architect ]

Operations : no operations needed

Security : IAM , Bucket Policies, ACL, Encryption (Server/Client), SSL

Reliability : 99.99% durability / 99.9% availability, Multi AZ, CRR(Cross Region Replication)

Performance : scales to thousands of read/writes per second, transfer acceleration/multi-part for big files

Cost : pay per storage usage, network cost, requests number

반응형

[ DynamoDB ]

AWS 소유 기술의 NoSQL DB로 key/value 쌍으로 데이터 저장

멀티AZ, 읽기와 쓰기의 분리, read cache 로 DAX 사용

IAM 을 사용하여 보안

DynamoDB Stream 을 사용하여 AWS Lambda와 통합 (DynamoDB Stream이 데이터 변화 감지하여 AWS Lambda 호출)

백업과 복구 가능, 글로벌 테이블 사용

cloudwatch를 통한 모니터링

SQL 쿼리 불가. 오직 key 및 인덱스 기준 조회만 가능

트랜잭션 지원 (2018. 11월)

- AWS proprietary technology, managed NoSQL database

- Serverless, provisioned capacity, auto scaling, on demand capacity (Nov 2018)

- Can replace ElastiCache as a key/value store (storing session data for example)

- Highly Available, Multi AZ by default, Read and Writes are decoupled, DAX for read cache

- Reads can be eventually consistent or strongly consistent

- Security, authentication and authorization is done through IAM

- DynamoDB Streams to integrate with AWS Lambda

- Backup / Restore feature, Global Table feature

- Monitoring through CloudWatch

- Can only query on primary key, sort key, or indexes

※ Use Case : Serverless applications development (small documents 100s KB), distributed serverless cache, doesn't - have SQL query language available, has transactions capability from Nov 2018

 

[ DynamoDB for Solutions Architect ]

Operations : no operations needed, auto scaling capability, serverless

Security : full security through IAM policies, KMS encryption, SSL in flight

Reliability : Multi AZ, Backups

Performance : single digit millisecond performance, DAX for caching reads, performance doesn't degrade if your application scales

Cost : Pay per provisioned capacity and storage usage (no need to guess in advance any capacity - can use auto scaling)

 

반응형

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] 18-6. Databases in AWS : Athena  (0) 2021.09.25
[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-4. Databases in AWS : ElastiCache  (0) 2021.09.23
[AWS] 18-3. Databases in AWS : Aurora  (0) 2021.09.23
[AWS] 18-2. Databases in AWS : RDS  (0) 2021.09.23

[ Databases in AWS : ElastiCache ]

1. Managed Redis/Memcached (similar offering as RDS, but for caches)

2. In-memory data store, sub-milisecond latency

3. Must provision an EC2 instance type

4. Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)

5. Security through IAM, Security Groups, KMS, Redis Auth

6. Backup / Snapshot / Point in time restore feature

7. Managed and Scheduled maintenance

8. Monitoring through CloudWatch

※Use Case : Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL. 

 

[ ElastiCache for Solutions Architect ]

Operations : same as RDS

Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL

Reliability : Clustering, Multi AZ

Performance : Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option

Cost : Pay per hour based on EC2 and storage usage

 

반응형

'infra & cloud > AWS' 카테고리의 다른 글

[AWS] Databases in AWS : S3  (0) 2021.09.25
[AWS] 18-5. Databases in AWS : DynamoDB  (0) 2021.09.25
[AWS] 18-3. Databases in AWS : Aurora  (0) 2021.09.23
[AWS] 18-2. Databases in AWS : RDS  (0) 2021.09.23
[AWS] 18. Choosing the right database  (0) 2021.09.23

[ Aurora ]

OLTP 트랜잭션 프로세싱 지원

PostgreSQL/MySQL 호환

오토스케일링

1. Compatible API for PostgreSQL/MySQL (OLTP)

2. Data is held in 6 replicas, across 3 AZ

3. Auto healing capability

4. Multi AZ, Auto Scaling Read Replicas

5. Read Replicas can be Global

6. Aurora database can be Global for DR or latency purposes

7. Auto scaling of storage from 10GB to 128TB

8. Define EC2 instance type for aurora instances

9. Same security / monitoring / maintenance features as RDS

10. Aurora Serverless - for unpredictable / intermittent(간헐적인) workloads

11. Aurora Multi-Master - for continuous writes failover

※ Use case : same as RDS, but with less maintenance/more flexibility/more performance

 

[ Aurora for Solutions Architect ]

Operations : less operations, auto scaling storage

Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

Reliability : Multi AZ, highly available, possibly more than RDS, Aurora Serverless option, Aurora Multi-Master option

Performance : 5x performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas (only 5 for RDS)

Cost : Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise grade databases such as Oracle

 

반응형

[ Databases in AWS : RDS(Relational Database Service) ]

1. Managed PostgreSQL / MySQL / Oracle / SQL Server

2. Must provision an EC2 instance & EBS Volume type and size

3. Support for Read Replicas and Multi AZ

4. Security through IAM, Security Groups, KMS, SSL in transit

5. Backup / Snapshop / Point in time restore feature

6. Managed and Scheduled maintenance

7. Monitoring through CloudWatch

8. Use case : Store relational datasets (RDBMS/OLTP), perform SQL queries, transactional I/U/D

※ OLTP : On-line Transactional Processing

 

[ RDS for Solutions Architect ]

1. Operations : small downtime when failover happens, when maintenance happens, scaling in read replicas/ec2 instance/restore EBS implies manual intervention, application changes

2. Security : AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL

3. Reliability : Multi AZ feature, failover in case of failures

4. Performance : depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Storage auto-scaling & manual scaling of instances

5. Cost : Pay per hour based on provisioned EC2 and EBS

 

 

반응형

[ Choosing the right database ]

- We have a lot of managed databases on AWS to choose from

- Questions to choose the right database based on your architecture :

1) Read-heavy, write-heavy, or balanced workload? Thoughput needs? Will it change, does it need to scale or fluctuate during the day?

2) How much data to store and for how long? Will it grow? Average object size? How are they accessed?

3) Data durability? Source of truth for the data?

4) Latency requirements ? Concurrent users?

5) Data model? How will you query the data? joins? Structured? Semi-Structured?

6) Strong schema? More flexibility? Reporting? Search? RBDMS/NoSQL?

7) License costs? Switch to Cloud Native DB such as Aurora?

 

[ 1. Database Types ]

1. RDBMS(=SQL/OLTP) : RDS, Aurora - great for joins

2. NoSQL database : DynamoDB (~JSON), ElastiCache (key/value pairs), Neptune(graphs) - no joins, no SQL

3. Object Store : S3 (for big objects) / Glacier (for backups / archives)

4. Data Warehouse (=SQL Analytics/BI) : Redshift (OLAP), Athena

5. Search : ElasticSearch(JSON) - free text, unstructured searches

6. Graphs : Neptune - displays relationships between data

 

 

반응형

[ Big Data Ingestion Pipeline : Todo List ] 

1. We want the ingestion pipeline to be fully serverless

2. We want to collect data in real time

3. We want to transform the data

4. We want to query the transformed data using SQL

5. The reports created using the queries should be in S3

6. We want to load that data into a warehouse and create dashboards

 

[ Big Data Ingestion Pipeline ]

- IoT Core allows you to harvest data from IoT devices

- Kinesis is great for real-time data collection

- Firehose helps with data delivery to S3 in near real-time (1 min)

- Lambda can help Firehose with data transformations

- Amazon S3 can trigger notifications to SQS

- Lambda can subscribe to SQS (we could have connecter S3 to Lambda)

- Athena is a serverless SQL service and results are stored in S3

- The reporting bucket contains analyzed data and can be used by reporting tool such as AWS QuickSight, Redshift, etc...

반응형

- We have an application running on EC2, that distributes software updates once in a while

- When a new software update is out, we get a lot of request and the content is distributed in mass over the network. It's very costly

- We don't want to change our application, but want to optimize our cost and CPU, how can we do it?

 

[ Our application current state ]

ELB + ASG , running on multi AZ

 

[ Easy way to fix things : Using Amazon CloudFront ]

Why CloudFront?

- No changes to architecture

- Will cache software update files at the edge

- Software update files are not dynamic, they're static (never changing)

- Our EC2 instances aren't serverless

- But CloudFront is, and will scale for us

- Our ASG will not scale as much, and we'll save tremendously in EC2

- We'll also save in availability, network bandwidth cost, etc

- Easy way to make an existing applicaition more scalable and cheaper

반응형

+ Recent posts