The Ultimate Guide: Kubernetes Cluster on AWS — From Zero to Production

What is Kubernetes & Why AWS EKS?
Architecture Overview
Prerequisites Checklist
AWS IAM Setup
Install CLI Tools
VPC & Networking Setup
Create the EKS Cluster
Configure Node Groups
Connect kubectl to Cluster
Deploy Your First App
Expose via Load Balancer
Auto Scaling Setup
Monitoring & Logging
Cleanup & Cost Control

01 — What is Kubernetes & Why AWS EKS?

Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. Originally built by Google in 2014, it is now used by almost every major tech company in production environments.

Amazon EKS (Elastic Kubernetes Service) is AWS’s fully managed Kubernetes service. This means you do not need to manage the Control Plane yourself — AWS handles it for you. You simply focus on your worker nodes and applications.

Why Choose EKS?

Feature	Benefit
🔧 Managed Control Plane	AWS handles API server, etcd, scheduler — 99.95% SLA guarantee
⚡ AWS Native Integration	IAM, VPC, ALB, CloudWatch, ECR — all seamlessly integrated
📈 Auto Scaling	Cluster Autoscaler + Karpenter — nodes scale automatically with demand
🔒 Enterprise Security	RBAC, IRSA, Secrets Manager — production-grade security built-in

EKS vs Self-Managed Kubernetes: With self-managed Kubernetes, you must set up the Control Plane yourself, manage etcd, and handle upgrades manually. EKS delegates all of this to AWS. For production workloads, EKS is always the better choice.

EKS Pricing Breakdown

Cluster (Control Plane): $0.10/hour per cluster (~$73/month)
Worker Nodes: EC2 instance pricing (e.g. t3.medium ≈ $0.0416/hour)
Fargate (serverless nodes): Pay-per-use based on vCPU and memory
Data Transfer: Intra-AZ is free; cross-AZ costs $0.01/GB

02 — Architecture Overview

Before diving into commands, understand the complete picture — then we will go deeper into each component.

text┌─────────────────────────── AWS CLOUD ───────────────────────────────┐
│  ┌────────────────────── VPC (10.0.0.0/16) ──────────────────────┐  │
│  │                                                                 │  │
│  │  [kubectl / eksctl] ───►   CONTROL PLANE  (Managed by AWS)     │  │
│  │   DevOps Engineer          kube-apiserver | etcd               │  │
│  │                            scheduler | controller-manager      │  │
│  │                                    │                           │  │
│  │       ┌────────────────────────────┼────────────────┐          │  │
│  │       ▼                            ▼                ▼          │  │
│  │  ┌── AZ-1a ──┐             ┌── AZ-1b ──┐    ┌── AZ-1c ──┐    │  │
│  │  │ Worker 1  │             │ Worker 3  │    │    ALB    │    │  │
│  │  │ Pod A | B │             │ Pod E | F │    │    ECR    │    │  │
│  │  │ Worker 2  │             │           │    │CloudWatch │    │  │
│  │  │ Pod C | D │             │           │    │           │    │  │
│  │  └───────────┘             └───────────┘    └───────────┘    │  │
│  └─────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Control Plane Components

kube-apiserver — REST API gateway; every kubectl command goes through here
etcd — distributed key-value store; the brain and memory of the cluster
kube-scheduler — decides which node a pod should run on
kube-controller-manager — runs reconciliation loops (desired vs actual state)
cloud-controller-manager — handles AWS-specific integrations (ELB, EBS, etc.)

Worker Node Components

kubelet — node agent that communicates with the control plane
kube-proxy — manages network rules (iptables/IPVS) on each node
containerd — container runtime responsible for pulling/running images
VPC CNI — AWS-native pod networking plugin
CoreDNS — handles all DNS resolution within the cluster

03 — Prerequisites Checklist

Make sure everything below is ready before you start. Skipping any item leads to issues later.

Requirement	Version	Check Command
AWS Account	Active, billing enabled	`aws sts get-caller-identity`
AWS CLI	v2.x (latest)	`aws --version`
eksctl	v0.180.0+	`eksctl version`
kubectl	v1.28+ (match cluster version)	`kubectl version --client`
Docker	v24+ (for building images)	`docker --version`
Helm	v3.x (for package management)	`helm version`
IAM Permissions	Admin or EKS-specific policies	AWS Console
Region Decision	e.g. us-east-1, ap-south-1	`aws configure get region`

⚠️ Match your kubectl version! The kubectl version must be within ±1 minor version of your cluster’s Kubernetes version. If your cluster runs 1.28 but kubectl is 1.25 — commands will fail. Always match them.

04 — AWS IAM Setup — Correct Permissions

IAM is a straightforward step, but it causes the most mistakes. Both approaches are covered below — quick (admin) and production-ready (least privilege).

Option A: Quick Dev Setup (Admin User)

aws iam create-group --group-name EKS-Admins

aws iam attach-group-policy \
  --group-name EKS-Admins \
  --policy-arn arn:aws:iam::aws:policy/AdministratorAccess

aws iam add-user-to-group \
  --user-name your-username \
  --group-name EKS-Admins

Option B: Production (Least Privilege Policy)

Save the following as eks-admin-policy.json:

json{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "EKSClusterManagement",
    "Effect": "Allow",
    "Action": [
      "eks:*",
      "ec2:*",
      "iam:CreateRole",
      "iam:AttachRolePolicy",
      "iam:DetachRolePolicy",
      "iam:CreateInstanceProfile",
      "iam:AddRoleToInstanceProfile",
      "iam:PassRole",
      "iam:GetRole",
      "iam:CreateOpenIDConnectProvider",
      "iam:TagOpenIDConnectProvider",
      "cloudformation:*",
      "autoscaling:*",
      "elasticloadbalancing:*",
      "ssm:GetParameter"
    ],
    "Resource": "*"
  }]
}

aws iam create-policy \
  --policy-name EKSAdminPolicy \
  --policy-document file://eks-admin-policy.json

# Save the policy ARN from the output
# "Arn": "arn:aws:iam::123456789012:policy/EKSAdminPolicy"

Configure AWS CLI

$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG...
Default region name [None]: ap-south-1
Default output format [None]: json

# Verify the configuration
$ aws sts get-caller-identity
{
  "UserId": "AIDAIOSFODNN7EXAMPLE",
  "Account": "123456789012",
  "Arn": "arn:aws:iam::123456789012:user/eks-admin"
}

💡 Pro Tip: Instead of long-term access keys, use AWS IAM Identity Center (SSO). Run aws sso login --profile eks-profile to receive temporary credentials — far more secure for team environments.

05 — Install All CLI Tools

1. AWS CLI v2

Linux / Ubuntu:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version
# aws-cli/2.17.0 Python/3.11.6 Linux/5.15.0

macOS:

brew install awscli

Windows (PowerShell):

msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi

2. eksctl

eksctl is the official CLI tool by Weaveworks for creating and managing EKS clusters. It sets up a complete cluster with a single command.

Linux:

ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_${PLATFORM}.tar.gz"
tar -xzf eksctl_${PLATFORM}.tar.gz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
eksctl version
# 0.186.0

macOS:

brew tap weaveworks/tap
brew install weaveworks/tap/eksctl

3. kubectl

kubectl is the command-line tool to communicate with your cluster — deploy pods, view services, and fetch logs.

Linux:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client --output=yaml
# gitVersion: v1.29.3

macOS:

brew install kubectl

4. Helm

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
# version.BuildInfo{Version:"v3.14.0"}

06 — VPC & Networking Setup

Proper VPC configuration is critical for EKS. Worker nodes should be in private subnets. Load balancers and NAT Gateways live in public subnets.

VPC CloudFormation Template

Save as eks-vpc.yaml — this creates 6 subnets across 3 Availability Zones:

AWSTemplateFormatVersion: '2010-09-09'
Description: 'EKS Production VPC — 3 AZ, Public + Private Subnets'

Parameters:
  ClusterName:
    Type: String
    Default: my-eks-cluster

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: !Sub '${ClusterName}-vpc'
        - Key: !Sub 'kubernetes.io/cluster/${ClusterName}'
          Value: shared

  PublicSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: kubernetes.io/role/elb
          Value: '1'

  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.10.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      Tags:
        - Key: kubernetes.io/role/internal-elb
          Value: '1'

  InternetGateway:
    Type: AWS::EC2::InternetGateway

  NATGateway1:
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt EIP1.AllocationId
      SubnetId: !Ref PublicSubnet1

aws cloudformation create-stack \
  --stack-name eks-vpc \
  --template-body file://eks-vpc.yaml \
  --capabilities CAPABILITY_IAM

aws cloudformation wait stack-create-complete --stack-name eks-vpc
# ✓ Stack created successfully

Required Subnet Tags

Tag Key	Tag Value	Subnet Type	Purpose
`kubernetes.io/role/elb`	`1`	Public	Internet-facing ALB/NLB
`kubernetes.io/role/internal-elb`	`1`	Private	Internal load balancers
`kubernetes.io/cluster/CLUSTER_NAME`	`shared` or `owned`	Both	EKS cluster identification

⚠️ These tags are mandatory. Without them, the AWS Load Balancer Controller will not be able to discover your subnets and create load balancers automatically.

07 — Create the EKS Cluster

Now the real work begins. We will use the eksctl config file approach for full control and repeatability.

Complete cluster-config.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: my-production-cluster
  region: ap-south-1             # Mumbai region
  version: "1.29"                # Kubernetes version
  tags:
    Environment: production
    Team: platform-engineering
    CostCenter: k8s-infra

vpc:
  id: "vpc-0a1b2c3d4e5f67890"    # Your VPC ID from Step 6
  subnets:
    private:
      ap-south-1a:
        id: "subnet-0private1"
      ap-south-1b:
        id: "subnet-0private2"
      ap-south-1c:
        id: "subnet-0private3"
    public:
      ap-south-1a:
        id: "subnet-0public1"
      ap-south-1b:
        id: "subnet-0public2"

iam:
  withOIDC: true                  # Required for IRSA
  serviceAccounts:
  - metadata:
      name: aws-load-balancer-controller
      namespace: kube-system
    wellKnownPolicies:
      awsLoadBalancerController: true

addons:
- name: vpc-cni
  version: latest
  attachPolicyARNs:
    - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- name: coredns
  version: latest
- name: kube-proxy
  version: latest
- name: aws-ebs-csi-driver
  version: latest
  wellKnownPolicies:
    ebsCSIController: true

managedNodeGroups:
- name: general-purpose
  instanceType: t3.medium
  minSize: 2
  maxSize: 6
  desiredCapacity: 3
  privateNetworking: true
  subnets:
    - ap-south-1a
    - ap-south-1b
    - ap-south-1c
  labels:
    role: general
    lifecycle: on-demand
  tags:
    k8s.io/cluster-autoscaler/enabled: "true"
    k8s.io/cluster-autoscaler/my-production-cluster: "owned"
  iam:
    attachPolicyARNs:
      - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
      - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
      - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

cloudWatch:
  clusterLogging:
    enableTypes:
      - api
      - audit
      - authenticator
      - controllerManager
      - scheduler

Run the Cluster Creation Command

eksctl create cluster -f cluster-config.yaml

# Expected output:
# [ℹ] eksctl version 0.186.0
# [ℹ] using region ap-south-1
# [ℹ] creating EKS cluster "my-production-cluster"
# [✔] EKS cluster "my-production-cluster" in "ap-south-1" region is ready

⚠️ This takes time! Cluster creation typically takes 15–20 minutes. Do not close your terminal.

08 — Node Groups Deep Dive

Use multiple node groups for different workload types to optimize both cost and reliability.

Node Group Type	Use Case	Cost	Interruption Risk
On-Demand	Critical workloads, databases	High	None
Spot Instances	Batch jobs, stateless apps	60–90% cheaper	Medium (2-min warning)
Fargate Profile	Serverless pods, isolated workloads	Per-use	None

Add a Spot Node Group

Save as spot-nodegroup.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: my-production-cluster
  region: ap-south-1

managedNodeGroups:
- name: spot-workers
  instanceTypes:              # Multiple types improve availability
    - m5.large
    - m5a.large
    - m4.large
    - t3.large
  spot: true
  minSize: 0
  maxSize: 20
  desiredCapacity: 3
  privateNetworking: true
  labels:
    role: spot-worker
    lifecycle: spot
  taints:
    - key: spot
      value: "true"
      effect: NoSchedule

eksctl create nodegroup -f spot-nodegroup.yaml

kubectl get nodes --show-labels
# NAME                          STATUS   ROLES    AGE   VERSION
# ip-10-0-10-45.ap-south-1...   Ready    <none>   12m   v1.29.3-eks-adc7111
# ip-10-0-11-23.ap-south-1...   Ready    <none>   11m   v1.29.3-eks-adc7111

09 — Connect kubectl to the Cluster

aws eks update-kubeconfig \
  --region ap-south-1 \
  --name my-production-cluster

# Output:
# Added new context arn:aws:eks:ap-south-1:123456789012:cluster/my-production-cluster

# Verify the connection
kubectl cluster-info
# Kubernetes control plane is running at https://ABC123.gr7.ap-south-1.eks.amazonaws.com

kubectl get nodes
# NAME                          STATUS   ROLES    AGE   VERSION
# ip-10-0-10-45.compute...      Ready    <none>   18m   v1.29.3-eks-adc7111

Managing Multiple Cluster Contexts

# List all contexts
kubectl config get-contexts

# Switch context
kubectl config use-context arn:aws:eks:ap-south-1:123456789:cluster/staging

# Using kubectx (recommended tool)
brew install kubectx
kubectx prod       # Switch to production cluster
kubens production  # Switch namespace

Essential kubectl Commands

# ── CLUSTER INFO ──────────────────────────────
kubectl cluster-info
kubectl get nodes -o wide
kubectl get all --all-namespaces
kubectl top nodes
kubectl top pods -A

# ── PODS ──────────────────────────────────────
kubectl get pods -A
kubectl get pods -n kube-system
kubectl describe pod POD_NAME
kubectl logs POD_NAME -f              # Follow logs
kubectl logs POD_NAME -c CONTAINER    # Specific container
kubectl exec -it POD_NAME -- /bin/bash

# ── DEPLOYMENTS ───────────────────────────────
kubectl get deployments
kubectl rollout status deployment/NAME
kubectl rollout history deployment/NAME
kubectl rollout undo deployment/NAME  # Rollback to previous version
kubectl scale deployment/NAME --replicas=5

# ── DEBUGGING ─────────────────────────────────
kubectl get events --sort-by=.lastTimestamp
kubectl describe node NODE_NAME
kubectl get componentstatuses

10 — Deploy a Production Application

A complete example including Deployment, Service, ConfigMap, Secret, and HPA — everything you need for a real workload.

Step 1: Create a Namespace

kubectl create namespace production
kubectl label namespace production env=production

Step 2: Create Secrets

kubectl create secret generic db-credentials \
  --from-literal=DB_HOST=rds.ap-south-1.rds.amazonaws.com \
  --from-literal=DB_PASSWORD=supersecret \
  --namespace production

Step 3: Full Application Manifest

Save as app-deployment.yaml:

---
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
  labels:
    app: web-app
    version: v1.0.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0        # Zero downtime deployments
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: nginx:1.25-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        envFrom:
        - configMapRef:
            name: app-config
        - secretRef:
            name: db-credentials
        readinessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 15
          periodSeconds: 10
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values: [web-app]
              topologyKey: kubernetes.io/hostname  # Spread pods across nodes

kubectl apply -f app-deployment.yaml

kubectl get pods -n production
# NAME                        READY   STATUS    RESTARTS   AGE
# web-app-6d8f9b7c8d-4xkp2   1/1     Running   0          2m
# web-app-6d8f9b7c8d-7mnq9   1/1     Running   0          2m
# web-app-6d8f9b7c8d-rv2x1   1/1     Running   0          2m

11 — Expose via Load Balancer

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-app-service
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: web-app
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP

kubectl apply -f service.yaml

kubectl get svc -n production
# NAME              TYPE           CLUSTER-IP    EXTERNAL-IP
# web-app-service   LoadBalancer   172.20.1.45   abc123.elb.amazonaws.com

# Test the endpoint
curl http://abc123.elb.amazonaws.com

Ingress with AWS Load Balancer Controller

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:ap-south-1:ACCOUNT:certificate/xxx
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-app-service
            port:
              number: 80

12 — Auto Scaling Setup

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 mins before scaling down

Cluster Autoscaler Install

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-production-cluster \
  --set awsRegion=ap-south-1 \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/cluster-autoscaler

# Verify
kubectl get pods -n kube-system | grep autoscaler
# cluster-autoscaler-xxx   1/1   Running   0   5m

How Scaling Works Together

High CPU detected
        │
        ▼
   HPA adds pods
        │
        ▼
No nodes available?
        │
        ▼
Cluster Autoscaler adds a new EC2 node
        │
        ▼
New pods scheduled on new node ✓

13 — Monitoring & Logging

Prometheus + Grafana via Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.enabled=true \
  --set grafana.adminPassword=admin123 \
  --set alertmanager.enabled=true

# Access Grafana dashboard
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open: http://localhost:3000  (admin / admin123)

CloudWatch Container Insights

ClusterName=my-production-cluster
RegionName=ap-south-1

curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | \
  sed -e "s/{{cluster_name}}/${ClusterName}/;s/{{region_name}}/${RegionName}/" | \
  kubectl apply -f -

Alertmanager — Slack/PagerDuty Alerts

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: slack-alerts
  namespace: monitoring
spec:
  route:
    groupBy: ['alertname', 'namespace']
    groupWait: 30s
    repeatInterval: 4h
    receiver: slack-notifications
  receivers:
  - name: slack-notifications
    slackConfigs:
    - apiURL:
        name: slack-webhook
        key: url
      channel: '#k8s-alerts'
      sendResolved: true
      title: '{{ .Status | toUpper }} — {{ .CommonLabels.alertname }}'

Must-Have Grafana Dashboards

✅ Node CPU, Memory, and Disk usage over time
✅ Pod restart count and OOMKilled events
✅ API server latency and request rate
✅ Deployment replica count vs desired
✅ Namespace-level resource quota usage
✅ HPA scaling events timeline

14 — Cleanup & Cost Control

⚠️ IMPORTANT: EKS resources cost money. If you built this for learning and are not actively using it, delete everything immediately. The control plane alone costs ~$73/month plus EC2 node charges.

Safe Cleanup Order

Step 1 — Delete applications and Ingress first (removes load balancers before VPC teardown, otherwise VPC gets stuck):

kubectl delete ingress --all --all-namespaces
kubectl delete svc --all -n production

Step 2 — Delete node groups:

eksctl delete nodegroup \
  --cluster my-production-cluster \
  --name general-purpose \
  --region ap-south-1

Step 3 — Delete the EKS cluster (10–15 minutes):

eksctl delete cluster \
  --name my-production-cluster \
  --region ap-south-1 \
  --wait

Step 4 — Clean up VPC and CloudFormation stacks:

aws cloudformation delete-stack --stack-name eks-vpc

Cost Optimization Best Practices

✅ Use Spot Instances for non-critical workloads — 60–90% cost savings
✅ Scale down dev clusters on nights and weekends using scheduled scaling
✅ Purchase AWS Savings Plans for committed, predictable workloads
✅ Regularly clean up unused namespaces, PVCs, and load balancers
✅ Set Resource Quotas on every namespace to prevent runaway costs
✅ Review costs weekly with AWS Cost Explorer and set billing alerts

Common Troubleshooting Issues

Issue	Likely Cause	Fix
Nodes stuck in `NotReady`	CNI plugin not installed	Check vpc-cni addon
Pods in `Pending`	Insufficient node resources	Check `kubectl describe pod` events
OIDC error on IRSA	Provider not associated	Re-run `eksctl utils associate-iam-oidc-provider`
Load balancer not created	Missing subnet tags	Add `kubernetes.io/role/elb: "1"` tag
High unexpected costs	Unused load balancers	Delete unused services of type LoadBalancer
kubectl auth error	Expired credentials	Run `aws eks update-kubeconfig` again

Here are all the reference links for your blog, organized by category:

📚 Official Reference Links

AWS & EKS — Official Docs

#	Title	Link
1	Amazon EKS Official Documentation	https://docs.aws.amazon.com/eks/
2	Create an EKS Cluster — AWS Docs	https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html
3	Getting Started with EKS (eksctl)	https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html
4	Create Amazon VPC for EKS	https://docs.aws.amazon.com/eks/latest/userguide/creating-a-vpc.html
5	Amazon EKS Best Practices Guide	https://docs.aws.amazon.com/eks/latest/best-practices/introduction.html
6	EKS Managed Node Groups	https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html
7	EKS Add-ons Documentation	https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html
8	Amazon EKS — AWS Product Page	https://aws.amazon.com/eks/

eksctl — Official Docs

#	Title	Link
9	eksctl GitHub Repository	https://github.com/eksctl-io/eksctl
10	eksctl Official Documentation	https://eksctl.io/
11	eksctl — Creating & Managing Clusters	https://eksctl.io/usage/creating-and-managing-clusters/
12	eksctl — Managed Node Groups	https://eksctl.io/usage/managed-nodegroups/

Kubernetes — Official Docs

#	Title	Link
13	kubectl Official Reference	https://kubernetes.io/docs/reference/kubectl/
14	kubectl Command Line Tool	https://kubernetes.io/docs/concepts/overview/kubectl/
15	Kubernetes HPA Documentation	https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
16	Kubernetes RBAC	https://kubernetes.io/docs/reference/access-authn-authz/rbac/
17	Pod Security Admission	https://kubernetes.io/docs/concepts/security/pod-security-admission/

Monitoring & Tools

#	Title	Link

#	Title	Link
18	Prometheus Community Helm Charts	https://prometheus-community.github.io/helm-charts
19	Grafana Kubernetes Dashboard	https://grafana.com/grafana/dashboards/315
20	CloudWatch Container Insights	https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html
21	Cluster Autoscaler GitHub	https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
22	Helm Official Docs	https://helm.sh/docs/
23	AWS Load Balancer Controller	https://kubernetes-sigs.github.io/aws-load-balancer-controller/

Table of Contents

01 — What is Kubernetes & Why AWS EKS?

Why Choose EKS?

EKS Pricing Breakdown

02 — Architecture Overview

Control Plane Components

Worker Node Components

03 — Prerequisites Checklist

04 — AWS IAM Setup — Correct Permissions

Option A: Quick Dev Setup (Admin User)

Option B: Production (Least Privilege Policy)

Configure AWS CLI

05 — Install All CLI Tools

1. AWS CLI v2

2. eksctl

3. kubectl

4. Helm

06 — VPC & Networking Setup

VPC CloudFormation Template

Required Subnet Tags

07 — Create the EKS Cluster

Complete cluster-config.yaml

Run the Cluster Creation Command

08 — Node Groups Deep Dive

Add a Spot Node Group

09 — Connect kubectl to the Cluster

Managing Multiple Cluster Contexts

Essential kubectl Commands

10 — Deploy a Production Application

Step 1: Create a Namespace

Step 2: Create Secrets

Step 3: Full Application Manifest

11 — Expose via Load Balancer

Ingress with AWS Load Balancer Controller

12 — Auto Scaling Setup

Horizontal Pod Autoscaler (HPA)

Cluster Autoscaler Install

How Scaling Works Together

13 — Monitoring & Logging

Prometheus + Grafana via Helm

CloudWatch Container Insights

Alertmanager — Slack/PagerDuty Alerts

Must-Have Grafana Dashboards

14 — Cleanup & Cost Control

Safe Cleanup Order

Cost Optimization Best Practices

Common Troubleshooting Issues

📚 Official Reference Links

AWS & EKS — Official Docs

eksctl — Official Docs

Kubernetes — Official Docs

Monitoring & Tools

Leave a Comment Cancel reply