AWS Devops skillbuilder

Browse courses on You can log in with AWS Partner or AWS account.

Note that links on left menu contains signature so anyone with the link can watch your videos and update your progress :) Completed: completed courses courses

To test on AWS you can create new Organizations [email protected] so the test does not affect much (except billing:) your account. Your account is management account and all other accounts (member accounts) can only be part of one organization. Benefits: volume discount, shared reserved instances and savings plans discounts across accounts. Each Organization Units OU is separated VPC, but we can establish single CloudTrail logs. Service Control Policies SCP are policies for memeber accounts (management account is not affected by scp) can be used to deny services to other OUs. In IAM policy you can use aws:PrincipalOrgId to allow principals from any OU.

User Control tower service to automate setup of multi account aws with a best practices, govern a secure and compliant multi-account environment. It runs on top of AWS Organizations. Detect policy violations and remediate them.

After you sign in as root (username is [email protected]), create IAM account alias trkasd so all IAM users can log in on You should enable MFA (you can do in emulator by installing Google Authenticator and inserting security code) create IAM user with AdministratorAccess and use that IAM user for all following tasks (create other IAM users, instances…)

Aws cli Install on ubuntu with

curl "" -o ""
sudo aws/install

and you can configure default credentials with

aws configure
AWS Access Key ID


Use consolidated billing for Aws Organizations to see combined usage, share volume pricing discount, receive single bill for multiple accounts. You should enabled Budget alerts on so you receive email when forecasted cost is greater than for example $10. First two budgets are free. Similar to Cloudwatch Billing alerts (available only on us-east-1, deprecated since it is only using actual spend) but more granular, and can filter by service, by tags, and alerts by forecasted cost.

You can enable Cost Allocation Tags so when you tag resources, you can filter by those tags in Cost Explorer. By default you can use group by dimension Service, but you can also group by Cost allocation tag. You can filter also. Cost and Usage Reports are most comprehensive set of AWS cost and usage data available. AWS Compute Optimizer used to reduce cost and improve perfomance.

IAM Identity and access management

AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC303)

  "Version": "2012-10-17", # use 2012-10-17 version to use policy variables like ${Account}
  "Statement": [
      "Effect": "Allow", # Allow or Deny
      "Action": [ # this could we "*" or array
        "iam: ChangePassword",
        "iam: GetUser"
      "Resource": "arn:aws:iam::123456789012:user/${aws:username}" # object that statement covers
      "Condition": {"IpAddress": {"aws:sourceIp":""}}

Evaluation logic: assumes Deny, if there is explicit Deny than it stops, if there is allow for that resource/action than it Allow, otherwise it Deny (impliciy deny).

Two types of policy:

Video on “Role-Based Access in AWS” Role is assummed programmatically and credentials are temporary and automatically rotated (they do not have username and passwords). Only IAM Role can have two policies: resource policy and one for principal. Role defines Trust policy (which Principal can assume the role)

  "action": ["sts:AssumeRole"],
  "Principal": {"Service": ""},

  "Principal": {"AWS": "arn:aws:iam::123123123123:user/test"}, # assume user
  "Principal": {"AWS": "123123123123"}, # assume account

  "Principal": {"Federated": "arn:aws:iam::123123123123:sampl-provider/ADFS"},

and Permission policy (what permissions the role can perform).

Resource-based policy is used on S3 bucket, SNS topic, SQS queue, KMS key

  "Statement": [{
    "Principal": {"AWS":["arn:aws:iam:123123123123"root]},
    "Effect": "Allow",

To allow access for all users under one account

     "Principal": {
        "AWS": "*"
      "Condition": {
        "StringEquals": {
          "AWS:SourceOwner": "121153076256"
        "DateGreaterThan": {
          "aws:CurrentTime": ["2020-11-11T00:00:00z","2022-11-12T00:00:00z"]

Principal is entity (root user, IAM user, or role) actor, can perform action or access resources. Action is list of API actions. You can use wild cards ? for single char, * for multiple characters like "Action": "iam:*AccessKey*" for all create/delete/list/update AccessKey apis. NotAction is used for exclusion (it is not Deny since other part of policy can allow it, very different from case when other part is explicitly Deny). Conditions all must match AND, with some value from array OR. Variables: ${aws:username}.

To restrict access you can use:

ARN format: arn:partition:service:region:account-id:resourcetype/resource

You can use Access Analyzer to make least privilege permissions. Access Analyzer helps you identify the resources in your organization and accounts, such as Amazon S3 buckets or IAM roles, shared with an external entity. This lets you identify unintended access to your resources and data, which is a security risk.

Instead of writting policy to each user, you can write policy for a group and use policy variables in Resource or in string comparisons in Condition element, for example access to their home folder under mybucket.

  "Version": "2012-10-17",
  "Statement": [
      "Action": ["s3:ListBucket"],
      "Effect": "Allow",
      "Resource": ["arn:aws:s3:::mybucket"],
      "Condition": {"StringLike": {"s3:prefix": ["${aws:username}/*"]}}
      "Action": [
      "Effect": "Allow",
      "Resource": ["arn:aws:s3:::mybucket/${aws:username}/*"]

You can use S3 Batch operations to add tags or copy After creating a role, you should start creating a batch job. When you run a job, if there is Not available Total objects listed in manifest, than you should check permissions.

AWS IAM is a service to securely manage access to aws account services and resources. Amazon Cognito manages identity inside applications, federate sign-in using OIDC or SAML, or social sign in like Facebook Identity Federation lets users outside of AWS to assume temporary role. Identity is stored on 3rd party authentication LDAP, Open ID, Cognito, SAML (Active Directory). Do not need to create AWS users, and no need to manage them. Custom Identity Broker Application can asks AWS Security Token Service STS for security credentials (we need to determine appropriate IAM policy). AWS Cognito - Federated Identity Pools for Public applications, goal is to provide direct access to AWS Resources from the Client side: Log in to FB than use that token to login to FIP federated identity provider (verify token and ask STS for credentials and send back temp aws credentials back from the federated identity pool (credentials are with pre defined iam policy) STS gives the following API: AssumeRole (your own account, or cross account - grant permissions on both iam accounts: one in UpdateApp role to grant accountA access to productionB S3 bucket, and another is grant accountA to assume UpdateApp role. AssumeRoleWithSAML, AssumeRoleWithWebIdentity (fb) deprecated, so use Cognito.

Cognito User Pools CUP, serverless database of users for your web and mobile apps, reset password, auth with fb, login send back a JWT token. Integration with AWS API Gateway, also with ALB. Cognito Identity Pools (Federated Identities) CIP when there are too many users or when there age guest users, we can not create IAM. We define default policy based on user_id using policy variables. CUP is for managing user/password, CIP is access to AWS services.

AWS IAM Identity center (successor to aws single sign-on sso deprecated), workforce authentication and authorization. One login to multiple aws accounts, business cloud applications (salesforce) and even ec2 windows instance. Attribute-based Access Control ABAC define permissions once and than modify access by changing the attributes (tags) in IAM Identity Center Identity Store


Videos: components of vpc

To create VPC you need to decide: Region where it is provisioned, IP range (CIDR Classless Inter Domain Routing) for example ie 10.10.0.x ip addressing with mask is just one ip. /0 is all ips IANA established that, and are going to be used as private ip addresses in local LAN (you should not use other ip addr). VPC default ip addresses:

To create a subnet, you need VPC, Availability zone AZ and IP range for subnet, for example Usually you create two subnets (in two different AZ) second is 10.10.2.x To add connection, you need to create Internet gatwway IGW and attach to VPC. Internet gateway IGW is device in VPC. You need to create if not exists.

Route tables are defining how to route traffic in subnets. Determination which subnet is private and which is public is inside route table. For public we need to create new route table, add route with destination and target IGW, and associate with subnet.

5 IPs are un-usable reserved .0 (network) .1 (gateway) .2 (dns) .3 (future) .255 (broadcast) so if you need 28 ip address you can not choose /27 (it is 32-5=27 addresses)

You can see inside instance in subnet that local IPs are not using gateway (gateway which means to connect directly) but other IPs like are going to gateway on .1 address.

# check routes on the server, flags Up, Gateway, Host
route -n
Destination      Gateway         Genmask         Flags Metric Ref    Use Iface         UG    0      0        0 eth0 UGH   100    0        0 eth0   U     100    0        0 eth0 UH    100    0        0 eth0

This table is the same for private instances (you need to temporary assign route table with IGW to run sudo apt install net-tools). As long as IGW route is assigned you can ping external IPs ping To ping local resources you need to add All ICMP - IPv4 from to security group used for those instances. Alternativelly, you can use nmap with -Pn (threat all as online) or -sn (ping scan, note that it does not discover instances when used with -, only direct ip) for example to find all:

nmap -sn
nmap -Pn
nmap -Pn 10.1.3.-
nmap -Pn 10.1.1-3.-
nmap -Pn

Subnets define sub-networks that must be ip range of VPC, for example if VPC is than subnets can be /17 …/24… for example ( is not part of the VPC network

Subnet is associated with route table, so when EC2 instance inside it wants to communicate to internet outband, route table should contain IGW along with default local entry (private subnet are not associated to route table which has entry with IGW). means any IP address.

# route table for public subnet  Local    IGW (Internet Gateway)

Inter subnet communications is possible because we use routes from VPC.

EC2 instance can use Elastic IP (static public IP address) to be able to get inbound internet connections. Elastic is assigned to Elastic Network Interface ENI (ENI is attached to EC2 instance). You can use public ip addresses (no need to be static) but check “Auto assign public IP” before you start instance since later you can not change that (and instance can not get new ip address). Note that even instance contains Public Ip Address but resides in subnet which is not associated with route table that goes to IGW, you can not connect to it outsite, and when you connect using bastion, you do not have access to internet from the instance.

By default, in (private) subnet, instance can not connect to internet. AWS managed Network Address Transation gateway service (NAT-GW) enables EC2 instances in private subnet to connect to internet outband. So here is route table for private subnet to point to NAT-GW which is in public subnet so it can connect to internet through IGW. NAT-GW is a service (machine managed by aws) and should be enabled in each availability zone. You are charged by the hour for each NAT-GW (IGW is free). NAT-GW allows only outband connections and replay to this connections, prevent the internet from initiating a connection to instances in private subnet. Allows updates. For IPv6 use Egress only internet gateway

# route table for private subnet  Local    NAT-GW

To access private instances you need to connect to public instance which acts as Bastion and once user is in VPC it can connect to other private instances.

VPC Endpoints is used to connect to Amazon S3 using Amazon private networks (not going to internet using IGW but using private network through VPCE). VPC Interface Endpoints is creating elastic network interface (ENI with IP address) so you can use them to connect to external services using your own vpc private network.

To secure access you can use Network Access control Network ACL and security groups. Network ACL is stateless so you need to enable both inbound and outbound ports. By default it is allowing in and out all ports, but you can for example allow 443 inbound and 1025-65535 outbound (since http responds to an ephemeral port). Security group is required for each EC2 instance. They are considered to be statefull resources, they will remember if connection is from outside and allow outbound traffic for that connection. By default they block all inbound and allow all outboud, so you need to add allow inbound rules.

Difference between Network ACL (NACLs nackles) and Security Groups Security groups are on instance level, define only Allow rules, statefull (return traffic is automatically allowed), all rules decide, applies only if someone is atttached to to instance NACL operates on subnet level, both allow and deny rules, stateless: return traffic must be explicitly allowed, rules in number order decide and if applied than other rules are not considered, applies to all subnet instances: good as backup layer of defence if someone forgot to use security group.

Virtual VPN-IPSec Using Virtual gateway VGW Direct Connect DX VPC can be peered with other VPC.

On new instances Enhanced Networking is automatically enabled

ethtool -i ens5
driver: ena


Cheaper Low cost ec2 instances can be obtained by fleet of Spot instances

You can use for steady state workloads using ECS on EC2. Fargate (serverless compute for containers) is better for short workloads, like tests.

Placement groups:

You can enable Termination protection DisableApiTermination true (so you can not terminate from console, API or CLI). But if you shutdown from instance sudo shutdown, it can be terminated if Shutdown behavior ec2 option InstanceInitiatedShutdownBehavior is set to terminate Difference between Terminate and Stop is that Stop will not remove any EBS disk, but Terminate will mostly remove EBS disks (you can set up this) You can Hibernate so the RAM is preserved (kept on the root EBS) so starting from hibernate is much faster, you do not need to boot OS. Use case for EC2 hibernate is when you have long running processing (it saves the ram) or you need to boot up quickly. This is supported on On-Demand, Reserved and Spot instances, but max hibernation is 60 days. Hibernation has to be enabled when instance is creating, Advance details > Stop -Hibernate behavior > Enable you need to enable encryption for root EBS volume.

There is a limit for number of instances in one region (InstanceLimitExceeded) for example 5 vCPU on-demand or spot instances. You need to start instance in another region (changing AZ does not help since the limit is for whole region). Search for vcpu on click on Calculate vCPU limit or Request for increase. If Amazon does not have sufficient capacity to run new on demand instance in specific AZ then error InsufficientInstanceCapacity will be returned.

Beside ON-demand instances which are most common (pay per second), you can use:

Instance types:

You can not purchase CPU credits, but you can change instance type.

SSH to the instance using: SSH, EC2 instance connect or Systems manager Session SSH private key (pem file, .cer file) should have 400 permission, or unprotected private key file error.

chmod 400 ~/config/keys/pems/2022trk.cer

permission denied error is shown when ssh username is not correct. connection timed out when SG, NACL, route table is no configured correctly, or public ip is missing. SG security group should allow TCP 22 from your ip or all ips

For EC2 Instance Connect inside AWS Console, it will push one time ssh public key valid for 60 seconds. Instance Connect will not work if you allow SSH 22 only from specific ip address which is not aws IP address. Also, if your user does not have permission to SendSSHPublicKey you need to enable it:

# a.json
    "Version": "2012-10-17",
    "Statement": [
        "Effect": "Allow",
        "Action": "ec2-instance-connect:SendSSHPublicKey",
        "Resource": [
        "Condition": {
            "StringEquals": {
                "ec2:osuser": "ec2-user"
        "Effect": "Allow",
        "Action": "ec2:DescribeInstances",
        "Resource": "*"
aws iam create-policy --policy-name add-send-ssh-to-instance --policy-document file://a.json
# copy policy arn and attach to the user who wants to use EC2 Instance Connect
aws iam attach-user-policy --policy-arn arn:aws:iam::606470370249:policy/add-send-ssh-to-instance --user-name read-only

If instance does not have public ip address, you need to use:

Route 53

Local DNS server asks Root DNS Server, then it gives NS TLD DNS Server (top level domain) then it asks him and he returns NS for SLD DNS, so it asks him and it returns some IP to local DNS server.

You can not use CNAME records on zone apex (root) domain, but you can use ALIAS records, used only on Route 53, used to target other AWS resource. It is actually A records but it automatically recognize if IP of the resource changes. It can be used for top level domain. Alias Records Targets can be almost anything ELB, Cloudfront, API gateway, S3 websites, but not the EC2 DNS name (so for ec2 directly you can not use apex domain, but you can use alias to ELB). For ALIAS records you can not set TTL Time to live - time period in cache (for another record it is using TTL of the target, or if it is other AWS service it is 60 seconds). You can inspect with dig

# this returns two ip addresses of load balancer

# this return CNAME load balancer and than two ip addresses


dig returns TTL

Hosted Zones can be private (so we can use inside VPC db.example.internal and webapp.example.internal) and public (when you buy a domain).

Routing policies:

Hybrid DNS we need Route 53 Resolver Endpoint (inbound/outbound endpoint) when onpremise wants to get ip of instance/instance wants to get ip of onpremise. Conditional Forwarding Rules used for private hosted zone: ec2.internal,

S3 with custom domain: bucket name should be the same as domain, grant public access and enable static website hosting, in route 53 use ALIAS A to S3 bucket. For https we need to use cloudfront (we need something to terminate ssl).

AWS Load balancer LB

Diffrent types:

ALB Load balancer security group should enable http and https access from and ec2 instances should allow http (no need for https) only from load balancer security group (if you do not need direct access, for example use SSM Session manager for accessing the shell). Use the name like myapp-lb-http-https-sg-also-used-to-accept-http-on-ec2-sg

ALB Load balancing can be:

ALB has fixed hostname and passes client IP as a headers: X-Forwared-For and X-Forwared-Port and X-Forwared-Proto

Default load balancing algorithm is round robin, which distrubutes each requests in turn. Another is LOR least outstanding requests, next instance is the instance with the lowest number of pending/unfinished requests. Can not be used with “Slow start duration” ex 30 seconds newly created instances receives 1, than 2, then 3 requests (not a bunch of them in first second) Slow start mode is defined on Target Group. For apps that store session info locally, you can enable Sticky Sessions so ALB will send to the same target (it can generate inbalanced load). Application based: Cookie name is AWSALBAPP (or custom cookie when target generate the cookie _myapp_session). Duration based cookies AWSALB. NLB uses a flow hash (hash of Protocol, Source IP, Destination IP, Source Port, Destination Port, TCP sequence number) and if that does not change, it will be routed to the same instance.

Cross-zone Load balancing is to distribute evenly across all instances in all AZ (ex: 2 in us-east-1a and 8 in us-east-1-b, each instance get 10% of traffic). It is enabled by default so user is not charged for inter AZ communication. For NBL and GLB is not enabled by default so user is charged for inter AZ data.

Target group weighting is that you can controller distribution of the traffict Blue/green deployment ex: create a new instances that will receive 10% of the traffic.

SSL is Secure Sockets Layer, newer version is TLS Transport Layer Security are issues by CA Certificate Authorities (Letsencrypt, GoDaddy…). ALB is SSL termination, it uses ACM Aws Certificate Manager to manage certs.

HTTPS listeners can use default and multiple certs to support multiple domains. Different domains supported by SNI Server Name Indication, ALB pick correct SSL cert and route to target group for that domain.

Generate new cert in ACM is easy, you just need to click on email confirmation link for your domain, or change dns settings for your domain. Cert is issued by Common Name (CN) Amazon RSA 2048 M01, and valid for 13 months. You need to add CNAME that points to ALB ex:

Connection draining is Time to complete “in-flight requests” while the instance is de-regestering or unhealthy. Default is 300 seconds (5min). You can set 30s if you have ready-to-use AMI. During this cooldown deregistration period ASG will not launch or terminate additional instances to allow for metrics to stabilize. Y/ou can use ASG Lifecycle Hooks to pause ec2 instance in the terminating state for troubleshooting.

New requests are send to other healthy instances. All healthy statuses are:

TG health check: if Target group contains only unhealthy targets, ELB routes requests across it’s unhealthy targets since it assume that health check is wrong. HealthyThresholdCount default 5 and UnhealthyThresholdCount default 2 is how many checks every HealthCheckIntervalSeconds consecutive (in a row) is enough to consider target healthy or unhealthy.

Common errors

5xx are server side errors, 5 looks like S (server) for example:

4xx are client errors (from browser to load balancer).

ClodWatch metrics:

You can trace single user in logs using custom header X-Amzn-Trace-Id and you might use for X-Ray

Auto scalling group

Auto scalling can be manual, dynamic (based on CloudWatch metrics and target value) and predictive (forecast for recurring cyclic patterns)

When you create ASG you need to define Launch template LT first. LT can have multiple versions (default is used). LT can create on-demand and spot instances. LT supports placement groups capacity reservations, dedicated hosts and multiple instance types. LT can use T2 unlimited burst feature.:

ASG Health check is using Health check grace period (default 300s 5min) so new instance will not be registered untill 5 minutes is passed.

ASG can be: simple step scaling (when CW alarm is triggered, ex CPU > 70% than add 1 unit), target tracking (it will automatically create two CW alarms for scale in (AlarmLow, remove instances) and scale out (AlarmHigh, add instances), (scale up means using bigger instances, vertical scalling), scheduled (on known used pattern) and predictive scaling (forecast load based on history).

Good metrics to scale on:

Some reasons when scaling fails: reached MaximumCapacity, some LT dependency was deleted (security group, key pair). If ASG fails 24h it will be suspended administration suspension.

AWS Auto Scaling Plans, similar to ASG, but as separate service.

EC2 Image Builder

Automatically create new image (select base image, update and customize), run tests on new ec2 instance running new image and distribute image to regions.

You need to create a MyImageBuilderEC2Role role with Ec2InstanceProfileForImageBuilder and AmazonSSMMAnagedInstanceCore policies. Make sure you Deregister AMI and delete Image Build Version, so you do not get charged.

Note that AMI is region locked, you can not share the same AMI between regions. AMI is used when you want to move EC2 to another AZ (but there is no reason for that since AZ are randomly enumerated by AWS).

EBS Block storage (you can update part of if) is provided with Elastic Block Storage EBS (one instance attached, or multi attach feature) instances has to be in same AZ as volume. You can increase volume size till 16 TB or you can attach multiple EBS to single EC2 instance. To use on multiple instances you should use Elastic File System EFS. EBS is used as a root boot device launched from AMI. It is like USB stick but as network drive (not physically attached) so there are latency. Delete on Termination attribute is by default enabled for the first volume (if you want to preserve root volume you need to disable this).

There are Provisioned SDD, General purpose SSD, and HDD volume type. You can make a backup using snapshot (they are incremental, save only what is changed).

st1 and sc1 can not be used as boot volume, size from 125GB to 16 TB. io1 and io2 volume can use ebs multi-attach (attach to multiple machines) to achieve higher application avilability in clustered linux applications like Teradata (app must manage concurent write operations). Still inside one AZ. Multi attach limit is max to 16 instances.

EC2 Instance Store is high performance hardware disk, directly attached to machine on which we run instance. It is ephemeral volume (lose on termination) so good for buffer, cache, scratch data and other temporary content Example is i3.large 100.000 IOPS, i3.16xlarge 3.300.000 IOPS.

We can not decrease (only to create new and copy) We can increase the EBS volume size (and IOPS for io1) but it will be in “optimisation” phase to be repartitioned. find hypervisor

aws --profile 2022trk ec2 describe-instance-types --instance-type t2.micro --query "InstanceTypes[].Hypervisor"

then check the current size

xvda    202:0    0   8G  0 disk 
└─xvda1 202:1    0   8G  0 part /

df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      8.0G  1.6G  6.4G  20% /

increase and you can see bigger size

xvda    202:0    0  10G  0 disk 
└─xvda1 202:1    0   8G  0 part /

now you need to resize partition

sudo growpart /dev/xvda 1

so we can see bigger partition size, but still is not available untill we reboot

xvda    202:0    0  10G  0 disk 
└─xvda1 202:1    0  10G  0 part /

df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      8.0G  1.6G  6.4G  20% /

Amazon Data lifecycle management

DLM is used to create and delete EBS snapshots automatically - scheduled.

EBS snapshots are incremental backups, so only the blocks that have changed are saved.

EBS Snaphosts - FSR Fast Snapshot Restore is used to prepare shapshot in each AZ that you want to restore the volume since it is much faster than pulling from S3

EBS Snapshots - Archive: move to 75% cheaper, but restoring is 24 to 72 hours Recycle bin for snapshots, specify retention for deleted snapshots.

To encrypt an unencrypted EBS volume you need to create snapshot, encrypt snapshot, create a new volume from it and attach it.

Amazon EFS - Elastic File System

This is managed NFS network file system that can be mount on many ec2, and those instances can be in any availability zone. You do not need to plan capacity, it can grow to Petabyte. 1000s of concurent NFS clients, 10GB/s throughput. Performance mode: general purpose is latency sensitive, max I/O is higher latency (web), but better throughput and higly parallel (big data and media). Throughput mode: bursting 1TB = 50 MiB/s and burst to 100MB/s provisioned 1GiB/s for 1TB, and elastic. Storage tiels: standard, infrequent access EFS-IA cheaper to store, but expensive to access (we need to use Lifecycle Policy). EFS One zone IA is 90% saving since it is only in one AZ.

EFS Access Points, restrict access to a directory based on IAM user.

EFS Operations : lifecycle policy (enable IA), throughput mode. When coping you need to use AWS DataSync to keep attributes and metadata.

EFS CloudWatch Metrics:

AWS Databases RDS

Managed database service: postgres, mysql, mariadb, oracle, microsoft sql, aurora (aws proprietary database). RDS multi AZ deployment will use single DNS and it will automatically failover in case disaster recovery DR (those standby instance becomes master instance). Failover happens when primary db instance: failed, OS is undergoing software patching, unreachable due to loss of network connectivity, modified eg db instance type changes, busy and unresponsive, underlying storage failure, or AZ outage happens, or manually failover when you initiate Reboot with failover. Scalling vertical (bigger instance) and horizontal (add more read replicas). Read replicas can be setup as multi AZ for DR. Going from single AZ to multi AZ is single click, which creates standby instance in another AZ, with zero downtime. You can not access to underlying instance (no ssh except RDS Custom). Storage Auto Scaling feature, it will scale automatically until Maximum Storage Threshold (for example 10% is free, 6h from last scalling event). RDS Read replicas is up to 5 another rds replicas, same AZ, Cross AZ, Cross Region.

Lambda can access only public RDS. For private RDS you need to start Lamda in VPC ie usine Elastic Network Interface ENI in your subnets. RDS proxy is used to manage connection pool and clean up iddle connections made by lambda functions, to avoud TooManyConnections exception.

Database Type Use Cases AWS Service Relational Traditional applications, ERP, CRM, e-commerce Amazon RDS, Amazon Aurora, Amazon Redshift (cloud data warehouse) Key-value High-traffic web apps, e-commerce systems, gaming applications Amazon DynamoDB In-memory Caching, session management, gaming leaderboards, geospatial applications Amazon ElastiCache for Memcached, Amazon ElastiCache for Redis Document Content management, catalogs, user profiles Amazon DocumentDB (with MongoDB compatibility) Wide column High-scale industrial apps for equipment maintenance, fleet management, and route optimization Amazon Keyspaces (for Apache Cassandra) Graph Fraud detection, social networking, recommendation engines Amazon Neptune Time series IoT applications, DevOps, industrial telemetry Amazon Timestream Ledger Systems of record, supply chain, registrations, banking transactions Amazon QLDB

RDS Parameter Groups: dynamic parameter are applied immediatelly, static params are applied after instance reboot. Force SSL: on postgres use rds.force_ssl = 1, on mysql GRANT SELECT ON mydatabase.* TO 'myuser'@'%' IDENTIFIED BY 'asd' REQUIRE SSL;.

Backups are continuous, allow point in time recovery PITR happens during maintenance windows. Backups have a retention period you set between 0 (disabled) and 35 days and can not be shared. Backup frequency eg daily. AWS Backup Vault Lock is used to apply archive policies, for example enforce WORK Write Once Read Many state, no body can delete backup. Backup plans can work on specific tags.

Snapshot are incremental (only first snapshot is full). Snapshots takes IO and can stop the database from seconds to minutes. You can share manual snapshots with another account (automated snapshots needs to be copied). You can not share encrypted with AWS keys since you do not have access to those keys, only KMS encrypted and user need to have access to the key. AWS owned keys (free, default), AWS managed keys (free aws/service-name) KMS Customer-managed keys CMK can be rotated manually. Imported KMS keys only manual rotation using alias. MKS Key Policies are used to control access to KMS CMK. CloudTrail log is used to audit KMS key usage. Symetric (one key to encrypt and descrypt). Asymetic (public is downloadable).

RDS Events are changes to states like pending/running, parameter groups. You can send to SNS or EventBridge. RDS Database Log files and you can send to CW Logs (slow query logs) CW merics associated with RDS gathered from the hypervisor: DatabaseConnections, SwapUsage, ReadIOPS, WriteIOPS, ReadLatency/WriteLatency, DiskQueueDepth, FreeStorageSpace. Enhanced monitoring gathered from an agent on the db instance: threads, cpu, memory metrics.


Unparalleled high performance and availability at global scale compatible with MySQL and PostgeSQL. When primary instance of Amazon Auror cluster is unavailable, aurora promotes an existing replica in another AZ to a new primary instance automatically. Aurora master and up to 15 auto scalled read replicas, similar to RDS multiAZ. Storage is replicated, self healing auto expanding (10GB up to 128TB). Writer Endpoint, point always to the single master. Reader Endpoint (help to find read replicas), connection load balancing. Automatic failover, Aurora Backtracking without using backups, but it is in-place restore, Aurora DB Cluster Automatic Backups and restore to a new db cluster (this can not be disabled), automated patching with zero downtime, advanced monitoring, Aurora Database Cloning by using the same cluster volume and copy-on-write protocol eg create a test env from prod

Aws Privatelink is used for a private, encrypted channel of communication between its on-premises data center and a VPC in the AWS Cloud

Amazon ElastiCache

In memory database Redis or Memcached. Redis is used for: multi AZ with auto failover, read replicas to scale reads and have high availability, backup and restore features, supports Sets and Sorted Sets cluster mode disabled: one shard - node group (all nodes have all the data, 1 primary and upto 5 replica) horizontal scaling by adding new read replicas, vertical by changing node type (internally create a new node group, then data replication, than DNS update) cluster mode enabled: data is partitioned into shards (up to 500 nodes per cluster, 500 shards with single master, or 250 shards with 1 master and 1 replica). scaling has two modes: online scaling (no downtime, some degradation in performance), offline scaling (unable to serve during backup and restore, changing node type). horizontal scaling: Resharding (scale out/in by adding/removing shards) and Shard Rebalancing (equally distribute the keyspaces among the shards) can be online or offline scaling vertical scaling (change node type) can be done as online scaling

To test connection from EC2 you use use Primary endpoint (not Node Endpoint):

redis-cli -h ping

Redis metrics to monitor:

Memcached: multi-node for partitioning of data (sharging, split data), no high avilability (reprication), non persistent, no backup and restore, multi-threaded architecture horizontal scaling: adding new nodes, auto-discovery allow the app to find nodes scale vertical: change node type, they start blank since there is no restore Memcached metrics to monitor:

Amazon CloudWatch

CloudWatch is a service for monitoring metrics and logs. Basic monitoring is collecting every 5 minutes, detailed monitoring is paid and it collects every 1 min. It includes: CPUUtilization (processing power), NetworkIn/NetworkOut, DiskReadOps/DiskWriteOps, DiskReadBytes/DiskWriteBytes (only when disk is attached, not for ebs), CPUCreditUsage 1 cpu running 100% for 1 minute. Metrics belong to namespaces and can have up to 30 dimensions per metric (environment, instance id). Dashboards are global and can include metrics from different AWS accounts and regions. You can change time zone and set up auto refresh. Sharing to outside using SSO provider Amazon Cognito. 3 dashboards up to 50 metics for free. $3/dashboard/month afterwards.

You can push custom metrics, for example RAM or from application, every second if you want. Two weeks in the past and two hours in the future is accepted for a timestamp value. You can use PutMetricData API to send data using cli, you can test using cloud shell

aws cloudwatch put-metric-data --metric-name Buffers --namespace MyNameSpace --unit Bytes --value 231434333 --dimensions InstanceId=1-23456789,InstanceType=m1.small
# after 5 minutes
aws cloudwatch put-metric-data --metric-name Buffers --namespace MyNameSpace --unit Bytes --value 231434333 --dimensions InstanceId=1-23456789,InstanceType=m1.small

or an Unified Cloudwatch agent installed with AWS System Manager Agent SSM agent). Example for memory util

procstat Plugin for CWAgent can collect specific proccess CPU time, memory. Prefix is procstat_cpu_time, procstat_cpu_usage.

You can also export data using GetMetricData API. You can create alarm from EC2 instance -> right click -> Manage cloudwatch alarm

Log groups (arbitrary name, eg application name), log stream (instances within application), can define log expiration policies (7 day expiration since we pay for cw logs storage).

Logs are received from sdk, Cloudwatch Unified Agent, each AWS service.

Logs exports can be send to S3, Kinesis Data Streams, Kinesis Data Firehose, AWS Lambda, Amazon OpenSearch. 12h to be exported.

Using a custom metrics, CloudWatch filter and CloudWatch alarm you can get notification when it is triggered more than 5 times per minute.

Amazon Cloudwatch Logs Insight perform queries and search and analyze logs interactively.

Cloudwatch Alarms are used to trigger notification for any metric, It can be in 3 states: OK, INSUFFICIENT_DATA, ALARM. Period is length of time in seconds to evaluate the metrics. Instead of sns notification we can trigger Reboot EC2 instance, trigger autoscaling action.

CloudWatch Composite Alarm monitors states of other alarms. Helpfull to reduce alarm noise eg skip high CPU when there is a high Network.

Manually trigger alarm

aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"

EC2 Status Check metrics:

EC2 Instace Recovery will keep same ip addresses.

For Load balancer you can use: RequestCount, HealthyHostCount, UnHealthyHostCount, TargetResponseTime, HTTP status codes

Event bridge is serverless event bus used to build event driven apps. You can receive webhook on API Gateway which uses lambda to put event on EventBridge which is using another lambda to put message to CloudWatch logs stream. For example S3 Event notification (create object) can trigger stream. Amazon EventBridge Overview and Integration with SaaS Applications Amazon EventBridge is similar to Cloudwatch Events (deprecated), but with more features. You can schedule automated snapshot of ebs. Default Event Bus (generated by AWS services, CW events), Partner Event Bus (receive events from SaaS service), Custom Event Buses (our own apps)

Amazon EventBridge Schema Registry gives the structure of the data. Amazon EventBridge Resource-based Policy manage permissin for specific Event Bus so we can allow PutsEvents from 3-th party accounts.

CloudWatch ServiceLens integrate health info in one place. Also integrates with Synthetics. It integrates with AWS X-Ray to pinpoint performance bottleneck. X-Ray is used for understading dependencies in microservice architecture, service graph.

CloudWatch Synthetics, to monitor API from outside-in using canaries: scripts that run on a schedule, written in Node.js or Python. Canaries offer access to headless Google Chrome via puppeteer or selenium webdriver.

ClodWatch Synthetics Canary Blueprints:

The AWS Health Dashboard is the single place to learn about the availability and operations of AWS services. It displays relevant and timely information to help users manage events in progress, and provides proactive notifications to help plan for scheduled activities

AWS Systems Manager SSM

It is a free service. Run command, state manager, inventory, patch manager, automation, explorer, Parameter Store, session manager (ssh), OpsCenter AWS Systems Manager gives you visibility and control of your infrastructure on operational data from multiple AWS services and allows you to automate operational tasks across your AWS resources.

To install SSM follow and create SSMInstanceProfile (this name from docs so we will use it) with AmazonSSMManagedInstanceCore policy.

When creating a EC2 instance you need to attach created ec2 instance profile SSMInstanceProfile. For existing ec2, you can also attach/replace IAM Role SSMInstanceProfile, and it will be automatically recognized with the Session Manager. You can activate hybrid intance with script.

It is usefull to remotelly run commands without need to open inbound ssh ports, and you can control which commands can be performed and it is auditable. Free service.

Command for cpu stress is:

sudo amazon-linux-extras install epel -y
sudo yum install stress -y
stress --cpu 1 --timeout 10m

You can see that it is working by clicking on “Public IPv4 address” or “Public IPv4 DNS” and removing “s” from “https://”.

To connect you can use

export PEM_FILE=~/config/keys/pems/2022.pem
export SERVER_IP=
ssh -i $PEM_FILE ubuntu@$SERVER_IP
ssh -i $PEM_FILE ec2-user@$SERVER_IP


To connect without asking for password and not using PEM you can ssh-copy-id but manually

scp -i $PEM_FILE ~/.ssh/ ubuntu@$SERVER_IP:
ssh -i $PEM_FILE ubuntu@$SERVER_IP
cat >> .ssh/authorized_keys && rm

SSM Run command

Documents can be: Managed (predefined) or custom documents (can be versioned). Command is set of actions, document, targets and run time paramaters. Use case:

on Amazon linux ami you should use service instead systemctl

service httpd start

Inside the machine you can find user script on

cat /var/lib/cloud/instance/cloud-config.txt

Look for logs on

cat /var/log/cloud-init-output.log

To see error in custom schema

cloud-init schema –system

## SSM Automation

It uses Automation Runbook (SSM Documents of type Automation).
Can be triggered manually, EventBridge, on a schedule, by AWS Config
Use case: Restart instance, create an AMI, EBS snapshot...

## SSM Session manager

Another way to connect is using AWS Systems manager > Fleet manager
EC2 need to create a role with `AmazonSSMManagedInstanceCore` policy (deprecated
policy is AmazonEC2RoleforSSM). In tutorials it is called instance profile so
put a name for a role `SSMInstanceProfile` and attach role to existing or new
ec2. No need for ssh keys nor open ports, you can use web or aws cli just
download and install session manager plugin with `sudo
./sessionmanager-bundle/install -i /usr/local/sessionmanagerplugin -b
/usr/local/bin/session-manager-plugin `

aws ec2 describe-instances –query “Reservations[].Instances[].InstanceId”

get instance id “i-0f3dde08ce455a628”

aws ssm start-session –target “i-0f3dde08ce455a628”

CloudTrail can intercept StartSession events if you need for compliance.
You can also restict to specific tags for other users, for example use policy
that permits `sssm:StartSession` Action with Condition StringLike
`"ssm:resourceTag/Environment": ["Dev"]`
Session log data can be sent to s3 or CloudWatch logs.
Preferences are on this tab

## SSM parameter store

It is used for configurations.
It stores passwords in plain text, so for passwords usually we use Secrets
Manager and we can also access them through name
We also have data like
`/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x84_64-gp2` public

It is free for max 10_000 parameters, and max 4KB size.
$0.05 for new advanced parameter per month (max 8KB, and max 100_000 total) and
can be attached with Parameter Policy like expiration: EventBridge will receive
notification 15 days before password expires

Name `/my-app/dev/my-db-url` with value `some url` and one encrypted
`/my-app/dev/redis-password` with value `encripted***` you can use cli to access
them (and decrypt if you have access to key store)

aws ssm get-parameters –names /my-app/dev/db-url /my-app/dev/redis-password –with-decryption

aws ssm get-parameters-by-path –path /my-app/ –recursive

## AWS Secrets Manager

It force rotate passwords, database credentials, api keys.
It can store binaries.

## SSM State manager

Automate the process of keeping instances in state that you define.
Use case: bootstrap instance with software, updates on a schedule.
State manager association: defines the state we want.

## SSM Patch manager

Automates the patching managed nodes with security updates, for OS and
Patch manager use patch baseline id so you can use SSM Run command to patch
speficic path groups.

SSM Maintenance windows: defines a schedule, duration, set of registered
instances, set of registered tasks.

# S3

Objects storage (flat storage and each object has uuid) Scallable Simple Object
Storage S3. Buckets reside in region, but the name should be uniq across all
buckets. It looks like that S3 is global service, but not, it is region based,
and you need to choose in which region to put a bucket.

Usage case is for: backups (EBS snapshot), media hosting, static websites.
User can create up to 100 buckets (or 1000 by submitting a service limit
increase). Bucket name should be uniq across all accounts. When deleted the name
is available after 24 hours. Name is between 3-63 characters long
Consist only of lowercase letters, numbers, dots (.), and hyphens (-)
Start with a lowercase letter or number
Not begin with xn-- (beginning February 2020)
Not be formatted as an IP address. (i.e.
Use a dot (.) in the name only if the bucket's intended purpose is to host an
Amazon S3 static website; otherwise do not use a dot (.) in the bucket name
since SSL wild card certificate will work for
Virtual hosted-style URL
Path style URL is
MAX object size is 5TB (upload using console is 160GB). Number of objects is
For upload bigger than 100MB it is recommended (bigger than 5GB required) to use
multipart upload and AbortIncompleteMultipartUpload lifecycle rule
Objects consits: key (uniq in bucket), version ID, value, access control info
and metadata (like key value pairs, for example content-type, can not be changed
once object is created).
You can use ap to 10 tags key value pairs to each object (128 unicode chars for
key, and 256 chars for value).
Delete key myfile will permanently remove the object if version is not enabled.
If version is enabled, then delete key myfile will add a mar
(deprecated http:s//

S3 supports resource based access control: using bucket policy or using Access
control list ACL on object or bucket level.
Also supports user based access control.

To enable static website hosting, you need to:
* enable static web hosting in bucket properties
* disable "Block public access" for the bucket (also on account level if needed)
* write bucket policy to grant public read access, update `Bucket-Name` with
  your bucket name

{ “Version”: “2012-10-17”, “Statement”: [ { “Sid”: “PublicReadGetObject”, “Effect”: “Allow”, “Principal”: “”, “Action”: [ “s3:GetObject” ], “Resource”: [ “arn:aws:s3:::Bucket-Name/” ] } ] }

* if bucket contains objects that are not owned by the bucket owner, you
  need object ACL access control list that grants everyone read access

Bucket policy is used to: grant public access to the bucket, force objects to be
encrypted at upload, grant access to another account (cross acount, for example
you do not have access to another user account, just put its arn in Principal).
You can track costs by using AWS generated tag for cost allocation. This tag
will appear in AWS Cost explorer, AWS Budgets (can send alarms for usage
limits), AWS Cost and usage report.

make bucket

aws s3 mb s3://mybucket

copy local file upload

aws s3 cp local-file s3://mybucket

list all buckets

aws s3 ls

list all files from bucket

asw s3 ls s3://mybucket

To move data you should use Aws DataSync (migrating by syncing, not one step),
DataSync can sync with S3, EFS, FSx, keeps file permissions and metadata. Agent
is running on schedule, daily weekly. Problem could be a slow internet
connection. Transfer Family is used when copying using network takes more than a
week, so we get a device with agent preinstalled, which pull the data into local
storage and than we ship the device to aws. Sync can be between different
services, or between different cloud providers.
For offline: Snowcone (hdd), snowball edge (ssd) and snowmobile (for exabytes).
Most cost optimal is to transfer on premises data to multiple Snowball edge
storage optimized devices and copy to Amazon S3 and create lifecycle policy to
transition the data into AWS Glacier Currently, there is no way of uploading
objects directly to S3 Glacier using a Snowball Edge.
AWS SMS server migration service does not have relation to shownball edge
For streaming use Amazon Kinesis Data Streams and Firehose.

For hybrid service you can use Aws Direct Connect (dedicated network connection
to AWS from on premise center) or AWS Storage gateway (used to connect data, ie
store on premise-data in an existing amazon S3 bucket, ie mount as NTF which
uses s3 file gateway - it is using s3 to store, but also the cache for local
most used files).
Those can not extend the VPC network. Use case could be: disaster recovery,
backup, tiered storage, on-premises cache and low-latency file access.

Amazon FSx is service to launch high performance file system to aws, for example
smb or ntfs, and it is backedup daily to s3.
FSx for Lustre (linux cluster) for machine learning, high performance computing
hpc (ssd, hdd options) can be used on premises through vpn or direct connect.

AWS Outposts is service that offers the same AWS infrastructure to any
datacenter, so you can extend your VPC into the on-premises data center, and you
can communicate with private ip addresses.

To avoid internet you can use VPC Endpoints.

Security mechanisms for bucket.
Newly created bucket can only be accessed by the user who created it or the
account owner. Other users can access using:
* AWS IAM: use IAM policy for your users accessing your S3 resources (does not
  have principal)

{ “Version”: “2012-10-17”, “Statement”: [ { “Sid”: “VisualEditor0”, “Effect”: “Allow”, “Action”: “s3:ListBucket”, “Resource”: “arn:aws:s3:::my-trk-bucket” } ] }

* Access control list ACL on individual objects (deprecated)
* Pre-signed URL: grant for limited time, note that presignigning can be
  successfully but actuall access will not work if credentials used in presign
  proccess does not have permission to read the object. If you used Security
  Token Service, presigned URL will expire when token expires.

# this is default signature version so no need to set, max 7 days # aws configure set default.s3.signature_version s3v4 aws s3 presign s3://my-trk-bucket/ –expires 60 curl “”

Storage classes:
Default is S3 standard tier. There is S3 IA or one zone IA (infrequent access)
you are charged for 128KB for object smaller than 128KB, or for 30days if you
remove before 30days). Use case is backup.

Use Intelligent-Tiering Archive configurations under Properties tab on bucket,
to minimize the cost: Archive Access tier (90days min, minutes to retrieve up to
5 hours) or Deep archive access tier (180 days min, up to 5 hours to retreive).
Object smaller than 128KB are always in frequent access tier.

Glacier Instant retrieval (milisecond retrival, access once a quarter, min 90
days), Glacier Flexible retrieval (retrieval 1min to 12 hours), Glacier Deep
Archive (access once or twice in year, retrieval 12-48hours) min 180days.
You need to initiate a restore and you can use `s3:ObjectRestore:Completed`
event to send notification (you need to update SNS topic so s3 can send this).

For data that is not often accessed but requires high availability choose Amazon
S3 standard IA.

Durability is 11 nines, 99,999999999%, loss of 1 object for 10.000.000 objects
every 10.000 years. That is the same for all storage classes
Availability is 99.99% for s3 standard ie 53min a year. S3 Standard IA is 99.9%

Cost includes
* storage price: based on storage class, eventual monthly monitoring fee
* request and data retrieval: every api/sdk call
* data transfer: price for bandwith in and out of s3, except data transferred in
  from the internet (upload), data transferred out to EC2 in the same region as
  bucket, data transferred out to cloudfront
* management and replication: price for features like S3 inventory, analytics,
  object tagging

Enable bucket logs under Properties -> Server access logging, you can add
prefix. It will log all GET requests, API calls.
It is advisable to create lifecycle rule under Management, to clear old logs.
Use Athena to serverless query logs
Create report with Amazon Quicksight to create Business intelligence BI
Use Glue service to convert csv to Apache Parquet or ORC, so data is stored as
columnar data for cost saving (less scan). Use larger files > 128 MB.

Object Lock using a write-once-read-many WORM model to prevent object from being
deleted or overwritten. It can be enabled only during bucket creation. It
enables versioning. You can configure Default retention mode so no users (or
governance users) can delete or overwrite during that period (for example 1year)

S3 replication Cross-region replication CRR (use case: low latency, compliance),
Same-region replication SRR (log aggregation, live reproduction prod to test)
Replication works only for new objects, for existing objects you need to use s3
batch replication

S3 object encryption: Server-side encryption sse is default, with sse-s3 key(you
do not have access to the key, header: x-amz-server-side-encryption": "aes256")
but you can use kms key sse-kms (you can see logs in cloudtrail, header:
"x-amz-server-side-encryption": "aws:kms", you need to have access to kms key
and kms limits are applied) or customer provided keys sse-c (we pass the key in
header for each requests, when reading we need to send same key in header).
Client-side encryption (data is encrypted before sending to s3).

Encryption in flight ie encryption in transit ssl/tls.

Cross-origin resource sharing CORS , origin = scheme (protocol) + host (domain)
+ port. Web browsers mechanism to allow visiting other origins, only if other
origin allow the request using CORS header Access-Control-Alow-Origin Browser
sends `OPTIONS / Host: Origin:`, and we need to
enable CORS for specific origin or for all origins.  CORS can not prevent
scripts to download files, it is only a webbrowser security.

[ { “AllowedHeaders”: [], “AllowedMethods”: [ “GET” ], “AllowedOrigins”: [ “*” ], “ExposeHeaders”: [] } ]

# Amazon Cloudfront

Cloudfront is content delivery network CDN.
400 point of presence in Global Edge network which are caching content and which
are connected using aws backbone network

Difference with S3 Cross Region Replication CRR is that cloudfront is good for
static files (TTL is a few days) available everywhere. CRR must be setup for
each region, and files are upding in near real-time so good for dynamic content.

When you enable Cloudfront, you do not need to enable public access for your
bucket, but you need to attach policy that give access to Cloudfront.

You can use AWS Certificate Manager to obtain ssl certificates.

You can enable Geographic Restrictions, and select countries in which your
content is available.

Access Logs can generate reports on: Cache Statistics, popular objects, top
referrers, usage, viewers.

Error codes from origin server 5xx or from S3 4xx are cached also, for example
user do not have access to the underlying bucket 403, or object not found 404.

Cache based on Headers, Session Cookies, Query String Parameters.
Expires or better is Cache-Control: max-age header.

AWS Global Accelerator is a networking tool, so when network is congested, is
optimizes the path to application.

# AWS CloudTrail

Track all user activity across your AWS accounts, see actions that user, role or
service has taken.
Inspect logs with CloudWatch Logs or Athena
For example find who when how deleted a.csv

SELECT * FROM “s3_access_logs_db”.”mybucket_logs” WHERE key = ‘a.csv’ AND operation LIKE ‘%DELETE%’ limit 10;

Find sum uploaded files from IP and last month

SELECT SUM(bytessent) as uploadTotal FROM s3_access_logs_db.mybucket_logs WHERE RemoteIP=’’ AND parse_datetime(RequestDateTime, ‘dd/MMM/yyyy:HH:mm:ss Z’) BETWEEN parse_datetime(‘2022-06-06’, ‘yyyy-MM-dd’) AND parse_datetime(‘2022-07-07’, ‘yyyy-MM-dd’);

Difference between server logs and cloudtrail (service for tracking API usage)
* server access logs delivers within a few hours, cloudtrail in 5min for data
  and 15min for management events
* cloudtrail is guaranteed and can be enabled on account, bucket or object level
  and can deliver logs to multiple destinations, and does not log authentication
  failures (but AcceessDenied is logged), json format.

When using a tags, you can write access policy with condition key
"s3:ExistingObjectTag/<key>": "<value>"

{ “Statement”: [ { “Effect”: “Allow”, “Action”: “s3:GetObject”, “Resource”: “arn:aws:s3:::photobucket/*”, “Condition”: { “StringEquals”: { “s3:ExistingObjectTag/phototype”: “finished” } } } ] }

also you can use tags in lifecycle rules, or cloudwatch metrics or croudtrail

To list all files, you can use API, but that could be expensive if there are a
lot of object. Instead you can enable Management -> Inventory service which will
periodically create cvs file in another bucket that you can query using Amazon
S3 Select (only SELECT command on csv json files) Note it has to be in one line

SELECT * FROM s3object s WHERE s._ 1 = ‘a’ LIMIT 5
or Athena.

S3 event notification can be used to call lambda, sns or sqs service when some
api call occurs.

Amazon Simple Queue Service (SQS) is a fully managed message queuing service
that enables you to decouple and scale microservices, distributed systems, and
serverless applications. SQS eliminates the complexity and overhead associated
with managing and operating message-oriented middleware and empowers developers
to focus on differentiating work.
Used for asynchronous integration between application components

Cloudtrail log file integrity validation can be used for audit.

# AWS Config

AWS config (service that track configurations of resources) can be used to make
a sns notification when for example bucket become public using a managed rule
*s3-bucket-public-read-prohibited* also used to enable security and regulatory
It can also prevent users for using other (unapproved) AMIs.
Similar tools that check public access is enabled are:
* Aws IAM Access Analyzer - check bucket policy ACL, access point policy, IAM
  roles, KMS keys, Lambda, SQS queues, Secrets Manager Secrets... any
  resource that can be accessed externally, that is outside of Zone of Trust (eg
  AWS Organization) for example Principal is "*".
* AWS Trusted Advisor - check S3 bucket permissions, other Trusted advisor
  checks can include: Cost Optimization, Performance, Security, Fault Tolerance,
  Service Limits. For example security group created by Directory Service should
  not have unrestricted access. Approaching limits.
  Also Cost optimization, under utilized EBS volumes, idle load balancers.
  Trus Advisor is free for core checks.

IAM Security Tools like IAM Credentials Report (account-level, download csv when
keys were used) and IAM Access Advisor (user-level) you can see which service is
used and identify unnecessary permissions that have been assigned to users.

Use S3 Storage Lens to optimize cost.

# AWS Directory Service

Managed microsoft active directory


ECS backplane is communicating with ECS agent for placement decision.
Cluster is a logical group of EC2 instances on which Task is run. Task could be
running on EC2 or Fargate. Task can contain one or more container (usually
second is only for logging), defined in Task Definition (blueprint): which
images url and configuration.
Service is for long running applications, is a group of Tasks.

Task definition

{ “containerDefinitions”: [ { “name”: “simple-app”, “image”: “httpd:2.4:, “cpu”: 256, # 1 virtual CPU is 1024 units. Also “0.25 vCpu” “memory”: 300, “ “512 MB, 1 GB” “portMappings”: [ { “hostPort”: 80, “coitnanerPort”: 80, “protocol”: “tcp” } ], “essential”: true }, { “name”: “busybox”, “image”: “busybox”,

}   ] } ```
aws ecs create-task
aws ecs create-service
aws ecs run-task  --launch-type=FARGATE

Task placement: satisfy CPU, memory and network… than other constraints and strategies: Location AZ us-east-1d, which instance type t2.small, Strategies: Binpack (minimize number of EC2 instances, choose instance with the least amount of memory or CPU, and all other tasks will be deployed there) Spread (evenly on all ec2). Constraints: Affinity



Control-plane nodes: controller manager, cloud controller, scheduler and API server that exposes Kubernetes API Etcd: key value store Worker nodes: Pod (group of one or more containers) similar to Task in ECS, created from PodSpec. Runtime (Docker or containerd), kube-proxy and kubelet

AWS Elastic beanstalk

AWS Service Catalog is to manage infrastructure as code (IaC) templates, so user do not need to know each aws service, then do not even need to be logged in to aws. TagOptions can be applies so they user the same tags. AWS Elastic beanstalk is for deploys web applications. Each Beanstalk environment will generate: ec2, ALB, S3, ASG, CW alarm, CloudFormation stack and domain name.

AWS CloudFormation

Use it to deploy to multiple AWS Regions quickly, automatically, and reliably. Use a template json file and create a stack.

Download templates

    Type: AWS::EC2::Instance
      AvailabilityZone: us-east-1a
      ImageId: ami-a4c7edb2
      InstanceType: t2.micro

You can upload template using cli aws cloudformation create-stack help for example

# create
aws --profile 2022trk cloudformation create-stack --stack-name myteststack --template-body file://terraform/ec2.cloudformation.yml --parameters ParameterKey=KeyName,ParameterValue=2022

aws --profile 2022trk cloudformation list-stacks --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE

# describe to find output
export PEM_FILE=~/config/keys/pems/2022.pem
export SERVER_IP=$(aws --profile 2022trk cloudformation describe-stacks --stack-name myteststack --query 'Stacks[0].Outputs[?OutputKey==`ServerIP`].OutputValue' --output text)
ssh -i $PEM_FILE ec2-user@$SERVER_IP sudo cat /var/log/cloud-init-output.log

# update
aws --profile 2022trk cloudformation update-stack --stack-name myteststack --template-body file://terraform/ec2.cloudformation.yml --parameters ParameterKey=KeyName,ParameterValue=2022

# destroy
aws --profile 2022trk cloudformation delete-stack --stack-name myteststack

There are over 224 resource types, type identifiers is: service-provider::service-name::data-type-name for example AWS::EC2::Instance

Parameters can be: String, Number, CommaDelimitedList, List, AWS Parameter and it can contain Constraints, AllowedValues, AllowedPattern.

    Type: String
    Description: Security Group Description

Use function !Ref MyParameter (or "Fn::Ref": in separate line). When we !Ref parameter then it returns paratemer value, and when we !Ref some resource then it returns physical ID of the underlying resource. You can "Fn::GetAtt": to get attributes of the resources using dot syntax

      !GetAtt EC2Instance.AvailabilityZone

There is also Fn::GetAZs: !Ref "AWS::Region" which will return a list of all AZs so you can pick first using !Select

You can create a string using join and delimiter

!Join [ ":", [ a, b, c ] ]
# => "a:b:c"

Substitute values

  - String # which contains ${VariableName}
  - { VariableName: VariableValue }

Pseudo parameters:

Mappings are fixed variables (region, az, ami, environment like dev/prod)

      "32": "ami-a4c7edb2"
      "64": "ami-a4c7edb2"

Syntax is !FindInMap [ MapName, TopLevelKey, SecondLevelKey ] Example use !FindInMap [RegionMap, !Ref "AWS::Region", 32]

Outputs are used to link with other Stack (you can not delete a stack if its outputs are being referenced by another stack). You can find return values in docs

    Value: !Ref myStack

    Description: Server IP address
    Value: !GetAtt MyInstance.PublicIp

    Value: !GetAtt myStack.Outputs.WebsiteURL

    Value: !Ref MyCompanySSHSecurityGroup
      Name: SSHSecurityGroup

Example usage for exported outputs is !ImportValue SSHSecurityGroup

Conditions are used to create based on parameter value or mappings using logic functions: !And, !Equals, !If, !Not, !Or

  CreateProdResources: !Equals [ !Ref EnvType, prod ]

Example usage is in the same level as Type

    Type: "AWS::EC2::VolumeAttachment"
    Condition: CreateProdResources

You can use cfn-init script instead of UserData since it is more readable. Also to be sure that stack is really working we can use WaitCondition so only after that signal the stack becomes CREATE_COMPLETE. So we use Metadata: and AWS::Cloudformation::Init to define what we want to install and from UserData we call cfn-init for our MyInstance, than call cfn-signal to send signal to WaitCondition. All logs go to cat /var/log/cfn-init.log and /var/log/cfn-init-cmd.log and /var/log/cloud-init.log and /var/log/cloud-init-output.log

    Type: AWS::EC2::Instance
        !Base64 |
          !Sub |
            #!/bin/bash -xe
            # Get the latest CloudFormation package
            yum update -y aws-cfn-bootstrap
            # Start cfn-init from Metadata
            /opt/aws/bin/cfn-init -s ${AWS::StackId} -r MyInstance --region ${AWS::Region} || error_exit 'Failed to run cfn-init'
            # Start up the cfn-hup daemon to listen for changes to the EC2 instance metadata
            /opt/aws/bin/cfn-hup || error_exit 'Failed to start cfn-hup'
            # All done so signal success
            /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackId} --resource SampleWaitCondition --region ${AWS::Region}
      Comment: Install a simple PHP application
              httpd: []
              php: []
    Type: AWS::CloudFormation::WaitCondition
        Timeout: PT1M

  # This will make sure it says CREATE_COMPLETE when all three instances are on
    Type: AWS::AutoScaling::AutoScalingGroup
      DesiredCapacity: "3"
        Count: "3"
        Timeout: PT15M

If Wait condition does not receive the required number of signals from ec2 instance than it could be: AMI does not have AWS cloudformation helper script aws-cfn-bootstrap package, inspect logs by disabling rollback on failure (OnFailure=DO_NOTHING option while creating a stack), and check that instance has internet connectivity with curl (through NAT if it in private, or Internet gateway it is in public subnet - public means it has a route to IGW in route table anyway)

Rollback means to back to previous known working state (it is was creation than everyting gets deleted) but you can enable option to keep other successfully created resources (after upload the template on Next there is Preserve successfully provisioned resources option)

If someone delete resource, and we update that resource it for some reason update did not succedded, stack will gone into UPDATE_ROLLBACK_FAILED state. We can go to Stack Actions -> Continue update rollback and you can skip that missing resource or manually create it (with the same name) so it ends up in UPDATE_ROLLBACK_COMPLETE state and we can try to update again. We can use drift detection to see if we missed when we manually create the resource. When template is wrong and we want to create a stack than ROLLBACK_COMPLETE is state and we can not update this stack (we can only remove this stack).

Another way to set up dependencies is to use dependon attribute

    DependsOn: MyDB


Nested stacks is used when you isolate repeated components. We just need a template url and parameters that are used

    Type: AWS::CloudFormation::Stack
        ApplicationName: !Ref AWS::StackName
        VPCId: !Ref VPCId
      TimeoutInMinutes: 5

ChangeSets are used to know what changes will be made (still do not know if it will be successfull). Change set is created on web for existing stacks.

Cloudformation drift occurs when someone manually change resources created by cloudformation. Stack actions -> Detect drift.

DeletionPolicy can be Retain (keep), Snapshot (keep the data), Delete (default) when we remove stack.

    Type: AWS::S3::Bucket
    DeletionPolicy: Retain

Beside createpolicy CreationPolicy deletepolicy DeletionPolicy there is also UpdatePolicy which can set AutoScalingRollingUpdate with MinInstancesInService: "1" and MaxBatchSize: "2" so we update 2 and keep 1 (keeps existing auto scalling group). Or there is AutoScalingReplacingUpdate with WillReplace: "true" (create new auto scalling group).

But if you want to prevent Stack to be deleted, you can enable TerminationProtection by Action -> Edit termination protection.

Use StackSet to provision across multiple accounts and regions (for example deploy IAM role in each account). Stack sets requires specific iam roles

Use Stack Policies to determine which resource can be updated. It is defined in separate json file and uploaded on web on Next page while creating stack. Action denied by stack policy error will be shown.

  "Statement": [
      "Effect": "Allow",
      "Action": "Update:*",
      "Principal": "*",
      "Resource": "*"
      "Effect": "Deny",
      "Action": "Update:*",
      "Principal": "*",
      "Resource": "LogicalResourceId/MyInstance"

Use resource import to bring existing resource to CloudFormation. Prevent updates to critical resources by using a Stack policy.

AWS LightSail Use pre-configured development stacks like LAMP, Nginx, MEAN, and Node.js. to get online quickly and easily.

AWS Application Discovery Service is used to collect data about the configuration, usage, and behavior of its on-premises data centers to assist in planning a migration to AWS

Amazon Codeguru

Only for Jvm java and python

CDK defining Kubernetes configuration in TypeScript, Python, and Java


acloudgutu courses twich TODO:

AWS Certified SysOps Administrator Associate SOA-CO2 Exam guide

preparation with AWS Certified Solutions Architect Associate SAA-CO2

SNS can not monitor Cloudwatch

The Well Architected Framework

Set of principles/pillars:

Agility is all about speed (experiment quickly), and not about autoscale or elimination of wasted capacity.

Amazon detective

Intrusion detection using cloudtrail logs, vpc flow logs , amazon guardduty, eks audit logs.

GuardDuty monitors workloads for malicious activity. Amazon GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation. It is looking for CloudTrail Events Logs, CloudTail Management Events, CloudTrail S3 Data Events (getObject), VPC Flow Logs (unusual IP address), DNS Logs. Public revealed keys. It has dedicated finding for CryptoCurrency attack.

VPC flow logs capture information about ip traffic to and from network interfaces.

Macie is used to detect sensitive data in S3 bucket, eg identify Personally Identifieable Information PII.

AWS Shield is managed DDos protection, enabled for free for each account. AWS Shield Advanced gives 24/7 support and aws bill reimbursement.

AWS Web Application Firewall WAF prevent web application common web exploits, such as bot traffic, sql injection, cross site scripting xss. I can be used to block countries (geo-match), Web access control list ACL rules can also block specific ip, http headers, url strings. WAF is deployed to ALB, API gateway, CloudFront. Penetration Testing, aws customers are welcome to carry out security assessment againts 8 aws services: ec2, rds, cloudfront, aurora, api gateways, lambda, ligtsail, elastic beanstalk. but for other test are profibited: dod, flooding, dns zone walking on route 53.

Amazon Inspector is for automated vulnerability detection. For ec2 identify unintended network access or OS vulnaerability using SSM agent, for ECR assessment of container images, for lambda vulnerabilities in function and package dependencies. Send finding to Amazon Event Bridge. Free for first 15 days.

Amazon DynamoDB

NoSQL database with automatic backup and restore, SLA 99.999%, optimize costs with automatic scales up and down.

AWS Glue

Discover prepare and move from one source for analytics or machine learning.

Amazon EMR

Run apache spark, hive, presto, hadoop,

AWS OpsWorks

Configuration management service to automate operations with chef and puppet. View operational data from multiple AWS services through a unified user interface and automate operational tasks This is alternative to AWS SSM.


AWS CloudHSM helps you meet corporate, contractual, and regulatory compliance requirements for data security. It uses a highly secure hardware storage device to store encryption keys. KMS is configuring custom key store with cloudhsm.


Customers can download AWS compliance documentation and AWS agreements. Compliance portfolio for Payment card industry PCI, Service Organization Control SOC, NDA agreement, HIPPA, audit reports. Can be used to support internal audit or compliance.

Amazon OpenSearch service

Use elasticsearch and kabana to analize log, real-time application monitoring, website search.

AWS Step functions

Visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines.

Amazon SageMaker Build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows

Amazon simple workflow service swf, build apps that coordinate work across distributed components

Support plan

Business support plan includes 24/7 email,chat support Enterprise support plan includes dedicated Technical Account Manager TAM, and Concierge for account issue.

AWS Amplify

Tools that help for full stack web and mobile app, think of it like elastic beanstalk for serverless apps.

AWS Kibana

Does not support IAM users and roles, but supports HTTP Basic authentication, SAML and Amazon Cognito.