The Ultimate Guide: Kubernetes Cluster on AWS — From Zero to Production


Table of Contents

  1. What is Kubernetes & Why AWS EKS?
  2. Architecture Overview
  3. Prerequisites Checklist
  4. AWS IAM Setup
  5. Install CLI Tools
  6. VPC & Networking Setup
  7. Create the EKS Cluster
  8. Configure Node Groups
  9. Connect kubectl to Cluster
  10. Deploy Your First App
  11. Expose via Load Balancer
  12. Auto Scaling Setup
  13. Monitoring & Logging
  14. Cleanup & Cost Control

01 — What is Kubernetes & Why AWS EKS?

Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications. Originally built by Google in 2014, it is now used by almost every major tech company in production environments.

Amazon EKS (Elastic Kubernetes Service) is AWS’s fully managed Kubernetes service. This means you do not need to manage the Control Plane yourself — AWS handles it for you. You simply focus on your worker nodes and applications.

Why Choose EKS?

FeatureBenefit
🔧 Managed Control PlaneAWS handles API server, etcd, scheduler — 99.95% SLA guarantee
⚡ AWS Native IntegrationIAM, VPC, ALB, CloudWatch, ECR — all seamlessly integrated
📈 Auto ScalingCluster Autoscaler + Karpenter — nodes scale automatically with demand
🔒 Enterprise SecurityRBAC, IRSA, Secrets Manager — production-grade security built-in

EKS vs Self-Managed Kubernetes: With self-managed Kubernetes, you must set up the Control Plane yourself, manage etcd, and handle upgrades manually. EKS delegates all of this to AWS. For production workloads, EKS is always the better choice.

EKS Pricing Breakdown

  • Cluster (Control Plane): $0.10/hour per cluster (~$73/month)
  • Worker Nodes: EC2 instance pricing (e.g. t3.medium ≈ $0.0416/hour)
  • Fargate (serverless nodes): Pay-per-use based on vCPU and memory
  • Data Transfer: Intra-AZ is free; cross-AZ costs $0.01/GB

02 — Architecture Overview

Before diving into commands, understand the complete picture — then we will go deeper into each component.

text┌─────────────────────────── AWS CLOUD ───────────────────────────────┐
│  ┌────────────────────── VPC (10.0.0.0/16) ──────────────────────┐  │
│  │                                                                 │  │
│  │  [kubectl / eksctl] ───►   CONTROL PLANE  (Managed by AWS)     │  │
│  │   DevOps Engineer          kube-apiserver | etcd               │  │
│  │                            scheduler | controller-manager      │  │
│  │                                    │                           │  │
│  │       ┌────────────────────────────┼────────────────┐          │  │
│  │       ▼                            ▼                ▼          │  │
│  │  ┌── AZ-1a ──┐             ┌── AZ-1b ──┐    ┌── AZ-1c ──┐    │  │
│  │  │ Worker 1  │             │ Worker 3  │    │    ALB    │    │  │
│  │  │ Pod A | B │             │ Pod E | F │    │    ECR    │    │  │
│  │  │ Worker 2  │             │           │    │CloudWatch │    │  │
│  │  │ Pod C | D │             │           │    │           │    │  │
│  │  └───────────┘             └───────────┘    └───────────┘    │  │
│  └─────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Control Plane Components

  • kube-apiserver — REST API gateway; every kubectl command goes through here
  • etcd — distributed key-value store; the brain and memory of the cluster
  • kube-scheduler — decides which node a pod should run on
  • kube-controller-manager — runs reconciliation loops (desired vs actual state)
  • cloud-controller-manager — handles AWS-specific integrations (ELB, EBS, etc.)

Worker Node Components

  • kubelet — node agent that communicates with the control plane
  • kube-proxy — manages network rules (iptables/IPVS) on each node
  • containerd — container runtime responsible for pulling/running images
  • VPC CNI — AWS-native pod networking plugin
  • CoreDNS — handles all DNS resolution within the cluster

03 — Prerequisites Checklist

Make sure everything below is ready before you start. Skipping any item leads to issues later.

RequirementVersionCheck Command
AWS AccountActive, billing enabledaws sts get-caller-identity
AWS CLIv2.x (latest)aws --version
eksctlv0.180.0+eksctl version
kubectlv1.28+ (match cluster version)kubectl version --client
Dockerv24+ (for building images)docker --version
Helmv3.x (for package management)helm version
IAM PermissionsAdmin or EKS-specific policiesAWS Console
Region Decisione.g. us-east-1, ap-south-1aws configure get region

⚠️ Match your kubectl version! The kubectl version must be within ±1 minor version of your cluster’s Kubernetes version. If your cluster runs 1.28 but kubectl is 1.25 — commands will fail. Always match them.


04 — AWS IAM Setup — Correct Permissions

IAM is a straightforward step, but it causes the most mistakes. Both approaches are covered below — quick (admin) and production-ready (least privilege).

Option A: Quick Dev Setup (Admin User)

aws iam create-group --group-name EKS-Admins

aws iam attach-group-policy \
--group-name EKS-Admins \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess

aws iam add-user-to-group \
--user-name your-username \
--group-name EKS-Admins

Option B: Production (Least Privilege Policy)

Save the following as eks-admin-policy.json:

json{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "EKSClusterManagement",
    "Effect": "Allow",
    "Action": [
      "eks:*",
      "ec2:*",
      "iam:CreateRole",
      "iam:AttachRolePolicy",
      "iam:DetachRolePolicy",
      "iam:CreateInstanceProfile",
      "iam:AddRoleToInstanceProfile",
      "iam:PassRole",
      "iam:GetRole",
      "iam:CreateOpenIDConnectProvider",
      "iam:TagOpenIDConnectProvider",
      "cloudformation:*",
      "autoscaling:*",
      "elasticloadbalancing:*",
      "ssm:GetParameter"
    ],
    "Resource": "*"
  }]
}
aws iam create-policy \
--policy-name EKSAdminPolicy \
--policy-document file://eks-admin-policy.json

# Save the policy ARN from the output
# "Arn": "arn:aws:iam::123456789012:policy/EKSAdminPolicy"

Configure AWS CLI

$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG...
Default region name [None]: ap-south-1
Default output format [None]: json

# Verify the configuration
$ aws sts get-caller-identity
{
"UserId": "AIDAIOSFODNN7EXAMPLE",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/eks-admin"
}

💡 Pro Tip: Instead of long-term access keys, use AWS IAM Identity Center (SSO). Run aws sso login --profile eks-profile to receive temporary credentials — far more secure for team environments.


05 — Install All CLI Tools

1. AWS CLI v2

Linux / Ubuntu:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version
# aws-cli/2.17.0 Python/3.11.6 Linux/5.15.0

macOS:

brew install awscli

Windows (PowerShell):

msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi

2. eksctl

eksctl is the official CLI tool by Weaveworks for creating and managing EKS clusters. It sets up a complete cluster with a single command.

Linux:

ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_${PLATFORM}.tar.gz"
tar -xzf eksctl_${PLATFORM}.tar.gz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
eksctl version
# 0.186.0

macOS:

brew tap weaveworks/tap
brew install weaveworks/tap/eksctl

3. kubectl

kubectl is the command-line tool to communicate with your cluster — deploy pods, view services, and fetch logs.

Linux:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client --output=yaml
# gitVersion: v1.29.3

macOS:

brew install kubectl

4. Helm

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
# version.BuildInfo{Version:"v3.14.0"}

06 — VPC & Networking Setup

Proper VPC configuration is critical for EKS. Worker nodes should be in private subnets. Load balancers and NAT Gateways live in public subnets.

VPC CloudFormation Template

Save as eks-vpc.yaml — this creates 6 subnets across 3 Availability Zones:

AWSTemplateFormatVersion: '2010-09-09'
Description: 'EKS Production VPC — 3 AZ, Public + Private Subnets'

Parameters:
ClusterName:
Type: String
Default: my-eks-cluster

Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: !Sub '${ClusterName}-vpc'
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}'
Value: shared

PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: kubernetes.io/role/elb
Value: '1'

PrivateSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.10.0/24
AvailabilityZone: !Select [0, !GetAZs '']
Tags:
- Key: kubernetes.io/role/internal-elb
Value: '1'

InternetGateway:
Type: AWS::EC2::InternetGateway

NATGateway1:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt EIP1.AllocationId
SubnetId: !Ref PublicSubnet1
aws cloudformation create-stack \
--stack-name eks-vpc \
--template-body file://eks-vpc.yaml \
--capabilities CAPABILITY_IAM

aws cloudformation wait stack-create-complete --stack-name eks-vpc
# ✓ Stack created successfully

Required Subnet Tags

Tag KeyTag ValueSubnet TypePurpose
kubernetes.io/role/elb1PublicInternet-facing ALB/NLB
kubernetes.io/role/internal-elb1PrivateInternal load balancers
kubernetes.io/cluster/CLUSTER_NAMEshared or ownedBothEKS cluster identification

⚠️ These tags are mandatory. Without them, the AWS Load Balancer Controller will not be able to discover your subnets and create load balancers automatically.


07 — Create the EKS Cluster

Now the real work begins. We will use the eksctl config file approach for full control and repeatability.

Complete cluster-config.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: my-production-cluster
region: ap-south-1 # Mumbai region
version: "1.29" # Kubernetes version
tags:
Environment: production
Team: platform-engineering
CostCenter: k8s-infra

vpc:
id: "vpc-0a1b2c3d4e5f67890" # Your VPC ID from Step 6
subnets:
private:
ap-south-1a:
id: "subnet-0private1"
ap-south-1b:
id: "subnet-0private2"
ap-south-1c:
id: "subnet-0private3"
public:
ap-south-1a:
id: "subnet-0public1"
ap-south-1b:
id: "subnet-0public2"

iam:
withOIDC: true # Required for IRSA
serviceAccounts:
- metadata:
name: aws-load-balancer-controller
namespace: kube-system
wellKnownPolicies:
awsLoadBalancerController: true

addons:
- name: vpc-cni
version: latest
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- name: coredns
version: latest
- name: kube-proxy
version: latest
- name: aws-ebs-csi-driver
version: latest
wellKnownPolicies:
ebsCSIController: true

managedNodeGroups:
- name: general-purpose
instanceType: t3.medium
minSize: 2
maxSize: 6
desiredCapacity: 3
privateNetworking: true
subnets:
- ap-south-1a
- ap-south-1b
- ap-south-1c
labels:
role: general
lifecycle: on-demand
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-production-cluster: "owned"
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

cloudWatch:
clusterLogging:
enableTypes:
- api
- audit
- authenticator
- controllerManager
- scheduler

Run the Cluster Creation Command

eksctl create cluster -f cluster-config.yaml

# Expected output:
# [ℹ] eksctl version 0.186.0
# [ℹ] using region ap-south-1
# [ℹ] creating EKS cluster "my-production-cluster"
# [✔] EKS cluster "my-production-cluster" in "ap-south-1" region is ready

⚠️ This takes time! Cluster creation typically takes 15–20 minutes. Do not close your terminal.


08 — Node Groups Deep Dive

Use multiple node groups for different workload types to optimize both cost and reliability.

Node Group TypeUse CaseCostInterruption Risk
On-DemandCritical workloads, databasesHighNone
Spot InstancesBatch jobs, stateless apps60–90% cheaperMedium (2-min warning)
Fargate ProfileServerless pods, isolated workloadsPer-useNone

Add a Spot Node Group

Save as spot-nodegroup.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: my-production-cluster
region: ap-south-1

managedNodeGroups:
- name: spot-workers
instanceTypes: # Multiple types improve availability
- m5.large
- m5a.large
- m4.large
- t3.large
spot: true
minSize: 0
maxSize: 20
desiredCapacity: 3
privateNetworking: true
labels:
role: spot-worker
lifecycle: spot
taints:
- key: spot
value: "true"
effect: NoSchedule
eksctl create nodegroup -f spot-nodegroup.yaml

kubectl get nodes --show-labels
# NAME STATUS ROLES AGE VERSION
# ip-10-0-10-45.ap-south-1... Ready <none> 12m v1.29.3-eks-adc7111
# ip-10-0-11-23.ap-south-1... Ready <none> 11m v1.29.3-eks-adc7111

09 — Connect kubectl to the Cluster

aws eks update-kubeconfig \
--region ap-south-1 \
--name my-production-cluster

# Output:
# Added new context arn:aws:eks:ap-south-1:123456789012:cluster/my-production-cluster

# Verify the connection
kubectl cluster-info
# Kubernetes control plane is running at https://ABC123.gr7.ap-south-1.eks.amazonaws.com

kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# ip-10-0-10-45.compute... Ready <none> 18m v1.29.3-eks-adc7111

Managing Multiple Cluster Contexts

# List all contexts
kubectl config get-contexts

# Switch context
kubectl config use-context arn:aws:eks:ap-south-1:123456789:cluster/staging

# Using kubectx (recommended tool)
brew install kubectx
kubectx prod # Switch to production cluster
kubens production # Switch namespace

Essential kubectl Commands

# ── CLUSTER INFO ──────────────────────────────
kubectl cluster-info
kubectl get nodes -o wide
kubectl get all --all-namespaces
kubectl top nodes
kubectl top pods -A

# ── PODS ──────────────────────────────────────
kubectl get pods -A
kubectl get pods -n kube-system
kubectl describe pod POD_NAME
kubectl logs POD_NAME -f # Follow logs
kubectl logs POD_NAME -c CONTAINER # Specific container
kubectl exec -it POD_NAME -- /bin/bash

# ── DEPLOYMENTS ───────────────────────────────
kubectl get deployments
kubectl rollout status deployment/NAME
kubectl rollout history deployment/NAME
kubectl rollout undo deployment/NAME # Rollback to previous version
kubectl scale deployment/NAME --replicas=5

# ── DEBUGGING ─────────────────────────────────
kubectl get events --sort-by=.lastTimestamp
kubectl describe node NODE_NAME
kubectl get componentstatuses

10 — Deploy a Production Application

A complete example including Deployment, Service, ConfigMap, Secret, and HPA — everything you need for a real workload.

Step 1: Create a Namespace

kubectl create namespace production
kubectl label namespace production env=production

Step 2: Create Secrets

kubectl create secret generic db-credentials \
--from-literal=DB_HOST=rds.ap-south-1.rds.amazonaws.com \
--from-literal=DB_PASSWORD=supersecret \
--namespace production

Step 3: Full Application Manifest

Save as app-deployment.yaml:

---
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
data:
APP_ENV: production
LOG_LEVEL: info
MAX_CONNECTIONS: "100"
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
labels:
app: web-app
version: v1.0.0
spec:
replicas: 3
selector:
matchLabels:
app: web-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Zero downtime deployments
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: nginx:1.25-alpine
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-credentials
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 15
periodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: [web-app]
topologyKey: kubernetes.io/hostname # Spread pods across nodes
kubectl apply -f app-deployment.yaml

kubectl get pods -n production
# NAME READY STATUS RESTARTS AGE
# web-app-6d8f9b7c8d-4xkp2 1/1 Running 0 2m
# web-app-6d8f9b7c8d-7mnq9 1/1 Running 0 2m
# web-app-6d8f9b7c8d-rv2x1 1/1 Running 0 2m

11 — Expose via Load Balancer

# service.yaml
apiVersion: v1
kind: Service
metadata:
name: web-app-service
namespace: production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
type: LoadBalancer
selector:
app: web-app
ports:
- port: 80
targetPort: 80
protocol: TCP
kubectl apply -f service.yaml

kubectl get svc -n production
# NAME TYPE CLUSTER-IP EXTERNAL-IP
# web-app-service LoadBalancer 172.20.1.45 abc123.elb.amazonaws.com

# Test the endpoint
curl http://abc123.elb.amazonaws.com

Ingress with AWS Load Balancer Controller

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-app-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:ap-south-1:ACCOUNT:certificate/xxx
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-app-service
port:
number: 80

12 — Auto Scaling Setup

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 mins before scaling down

Cluster Autoscaler Install

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-production-cluster \
--set awsRegion=ap-south-1 \
--set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/cluster-autoscaler

# Verify
kubectl get pods -n kube-system | grep autoscaler
# cluster-autoscaler-xxx 1/1 Running 0 5m

How Scaling Works Together

High CPU detected


HPA adds pods


No nodes available?


Cluster Autoscaler adds a new EC2 node


New pods scheduled on new node ✓

13 — Monitoring & Logging

Prometheus + Grafana via Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.enabled=true \
--set grafana.adminPassword=admin123 \
--set alertmanager.enabled=true

# Access Grafana dashboard
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open: http://localhost:3000 (admin / admin123)

CloudWatch Container Insights

ClusterName=my-production-cluster
RegionName=ap-south-1

curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | \
sed -e "s/{{cluster_name}}/${ClusterName}/;s/{{region_name}}/${RegionName}/" | \
kubectl apply -f -

Alertmanager — Slack/PagerDuty Alerts

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: slack-alerts
namespace: monitoring
spec:
route:
groupBy: ['alertname', 'namespace']
groupWait: 30s
repeatInterval: 4h
receiver: slack-notifications
receivers:
- name: slack-notifications
slackConfigs:
- apiURL:
name: slack-webhook
key: url
channel: '#k8s-alerts'
sendResolved: true
title: '{{ .Status | toUpper }} — {{ .CommonLabels.alertname }}'

Must-Have Grafana Dashboards

  • ✅ Node CPU, Memory, and Disk usage over time
  • ✅ Pod restart count and OOMKilled events
  • ✅ API server latency and request rate
  • ✅ Deployment replica count vs desired
  • ✅ Namespace-level resource quota usage
  • ✅ HPA scaling events timeline

14 — Cleanup & Cost Control

⚠️ IMPORTANT: EKS resources cost money. If you built this for learning and are not actively using it, delete everything immediately. The control plane alone costs ~$73/month plus EC2 node charges.

Safe Cleanup Order

Step 1 — Delete applications and Ingress first (removes load balancers before VPC teardown, otherwise VPC gets stuck):

kubectl delete ingress --all --all-namespaces
kubectl delete svc --all -n production

Step 2 — Delete node groups:

eksctl delete nodegroup \
--cluster my-production-cluster \
--name general-purpose \
--region ap-south-1

Step 3 — Delete the EKS cluster (10–15 minutes):

eksctl delete cluster \
--name my-production-cluster \
--region ap-south-1 \
--wait

Step 4 — Clean up VPC and CloudFormation stacks:

aws cloudformation delete-stack --stack-name eks-vpc

Cost Optimization Best Practices

  • ✅ Use Spot Instances for non-critical workloads — 60–90% cost savings
  • ✅ Scale down dev clusters on nights and weekends using scheduled scaling
  • ✅ Purchase AWS Savings Plans for committed, predictable workloads
  • ✅ Regularly clean up unused namespaces, PVCs, and load balancers
  • ✅ Set Resource Quotas on every namespace to prevent runaway costs
  • ✅ Review costs weekly with AWS Cost Explorer and set billing alerts

Common Troubleshooting Issues

IssueLikely CauseFix
Nodes stuck in NotReadyCNI plugin not installedCheck vpc-cni addon
Pods in PendingInsufficient node resourcesCheck kubectl describe pod events
OIDC error on IRSAProvider not associatedRe-run eksctl utils associate-iam-oidc-provider
Load balancer not createdMissing subnet tagsAdd kubernetes.io/role/elb: "1" tag
High unexpected costsUnused load balancersDelete unused services of type LoadBalancer
kubectl auth errorExpired credentialsRun aws eks update-kubeconfig again

Here are all the reference links for your blog, organized by category:


AWS & EKS — Official Docs

#TitleLink
1Amazon EKS Official Documentationhttps://docs.aws.amazon.com/eks/
2Create an EKS Cluster — AWS Docshttps://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html
3Getting Started with EKS (eksctl)https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html
4Create Amazon VPC for EKShttps://docs.aws.amazon.com/eks/latest/userguide/creating-a-vpc.html
5Amazon EKS Best Practices Guidehttps://docs.aws.amazon.com/eks/latest/best-practices/introduction.html
6EKS Managed Node Groupshttps://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html
7EKS Add-ons Documentationhttps://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html
8Amazon EKS — AWS Product Pagehttps://aws.amazon.com/eks/

eksctl — Official Docs

#TitleLink
9eksctl GitHub Repositoryhttps://github.com/eksctl-io/eksctl
10eksctl Official Documentationhttps://eksctl.io/
11eksctl — Creating & Managing Clustershttps://eksctl.io/usage/creating-and-managing-clusters/
12eksctl — Managed Node Groupshttps://eksctl.io/usage/managed-nodegroups/

Kubernetes — Official Docs

#TitleLink
13kubectl Official Referencehttps://kubernetes.io/docs/reference/kubectl/
14kubectl Command Line Toolhttps://kubernetes.io/docs/concepts/overview/kubectl/
15Kubernetes HPA Documentationhttps://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
16Kubernetes RBAChttps://kubernetes.io/docs/reference/access-authn-authz/rbac/
17Pod Security Admissionhttps://kubernetes.io/docs/concepts/security/pod-security-admission/

Monitoring & Tools

#TitleLink
#TitleLink
18Prometheus Community Helm Chartshttps://prometheus-community.github.io/helm-charts
19Grafana Kubernetes Dashboardhttps://grafana.com/grafana/dashboards/315
20CloudWatch Container Insightshttps://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html
21Cluster Autoscaler GitHubhttps://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
22Helm Official Docshttps://helm.sh/docs/
23AWS Load Balancer Controllerhttps://kubernetes-sigs.github.io/aws-load-balancer-controller/

Leave a Comment