Kubernetes Pod Lifecycle: A Complete Deep Dive

You ran kubectl apply, your terminal said “pod created,” but the app is still down. Sound familiar?

I’ve been there. 2 AM. Production alert firing. Pod shows Running but requests are failing. No obvious errors. That’s when I realized — I didn’t actually understand the Pod lifecycle. I just knew enough to deploy things and pray.

Once I truly understood what happens between kubectl apply and a Pod actually serving traffic, debugging became 10x faster. So let me walk you through the entire Kubernetes Pod lifecycle — not the textbook version, the version that actually helps you in production.


What Even Is a Pod?

Before we dive into lifecycle phases, let’s get the definition right — not the documentation copy-paste, but the real deal.

A Pod is a wrapper around one or more containers. Those containers share the same network (same IP), the same storage volumes, and — most importantly — the same lifecycle. If the Pod dies, all containers inside it die together.

Think of a Pod like a small apartment. The containers are roommates sharing one address. Kubernetes manages the building (the cluster), but it doesn’t care about keeping any one apartment forever. Pods are disposable by design.


The 5 Pod Lifecycle Phases

Phase 1: Pending

This is where every Pod starts. You’ve submitted the YAML, the API server accepted it — but the Pod hasn’t landed on any node yet.

Kubernetes is doing a lot of work behind the scenes here. The Scheduler is scanning all nodes, checking CPU/memory availability, taints, tolerations, node affinity rules, and topology constraints. Meanwhile, if your Pod needs a PersistentVolume, that has to be bound too.

Real scenario: You deploy a new service, and the Pod just sits in Pending for minutes. Nine times out of ten, it’s one of these:

  • Not enough CPU or memory on any node (Insufficient cpu in events)
  • The nodeSelector doesn’t match any node label
  • The PVC is stuck because the StorageClass can’t provision a volume
kubectl describe pod <pod-name>
# Scroll to the Events section — it tells you exactly what's wrong

Never ignore a Pending Pod. It will tell you everything if you just ask it.


Phase 2: Running

The Scheduler has picked a node. The Kubelet on that node has pulled the container image and started the containers. At least one container is up and running.

Here’s the thing most beginners miss: Running does NOT mean healthy. It just means the process started. Your app could be mid-initialization, throwing errors internally, or stuck waiting for a database connection — and Kubernetes still calls it “Running.”

This is the phase where your health probes take center stage (more on those shortly).

Also important: a Pod is permanently tied to one node. Once assigned, Kubernetes won’t move it elsewhere. If that node goes down, the Pod dies — and a new Pod gets created on a different node. Same workload, different Pod, different name.


Phase 3: Succeeded

All containers inside the Pod ran their task, completed it cleanly, and exited with status code 0.

You’ll mostly see this with Jobs and CronJobs — batch workloads like a DB migration script, a nightly report generator, or a one-time data cleanup task.

A typical web server Pod will never hit this phase on its own. It’s designed to run forever — unless you shut it down.


Phase 4: Failed

One or more containers exited with a non-zero exit code. Something went wrong.

The container might have crashed because of a bug, a missing environment variable, a secret that wasn’t mounted, or it ran out of memory (OOMKilled). Kubernetes marks the Pod as Failed and, depending on the restart policy, either restarts the container or gives up.

The first command you should run:

kubectl logs <pod-name> --previous

The --previous flag is gold. It shows you logs from the container before it crashed — the actual dying words of your app. Most people forget this flag and wonder why kubectl logs shows nothing useful.


Phase 5: Unknown

This one is spooky. Kubernetes lost contact with the node. The Pod’s state genuinely cannot be determined — the node isn’t responding, the Kubelet is silent.

After a grace period (typically around 5 minutes), Kubernetes applies the node.kubernetes.io/unreachable taint and eventually marks those Pods as Unknown. Then it deletes them and lets your Deployment create fresh ones on healthy nodes.

You’ll sometimes see Pods stuck in Terminating or Unknown for a long time after a node failure. That’s usually because finalizers are blocking the deletion. In those cases:

kubectl patch pod <pod-name> -p '{"metadata":{"finalizers":[]}}' --type=merge

Use that carefully — only when you’re sure the node is truly gone.


What Happens Under the Hood — The Full Flow

Here’s the complete journey from kubectl apply to your Pod serving traffic:

You run: kubectl apply -f deployment.yaml
|
v
API Server validates & stores Pod spec in etcd
|
v
Scheduler watches etcd → finds unscheduled Pod
|
v
Scheduler picks best node → updates Pod with nodeName
|
v
Kubelet on that node watches API Server → sees new Pod
|
v
Kubelet tells container runtime (containerd) → pull image, create container
|
v
Pod Phase: Pending → Running
|
v
Startup Probe runs (if configured)
|
v
Liveness + Readiness Probes begin
|
v
Readiness Probe passes → Pod IP added to Service Endpoints
|
v
Traffic flows to your app ✓

Every single step here can fail or delay. Understanding this chain is how you pinpoint exactly where things broke.


Restart Policies — Kubernetes Doesn’t Always Restart

A lot of beginners assume Kubernetes always restarts a failed Pod. It doesn’t. It depends on your restartPolicy.

PolicyWhen to useBehavior
AlwaysLong-running services (APIs, web apps)Restarts on any exit — crash or clean completion
OnFailureBatch JobsRestarts only on non-zero exit codes
NeverOne-shot tasks or debug PodsNever restarts, regardless of exit code
spec:
restartPolicy: OnFailure # Good for Kubernetes Jobs

Most Pods in a Deployment use Always by default. That’s fine for services, but if you’re running a Job and forget to set OnFailure, a successful run will still try to restart — causing unnecessary reruns.


CrashLoopBackOff — The Most Common Headache

CrashLoopBackOff is not a Pod phase. It’s a container state. And it’s something every Kubernetes engineer sees almost daily.

Here’s what it means: your container keeps crashing, and Kubernetes is throttling the restart attempts using exponential backoff:

10s → 20s → 40s → 80s → 160s → 300s (stays at 5 min max)

Kubernetes is essentially saying, “I’ll keep trying, but I’m not going to hammer the system if your app keeps dying.”

How to debug it:

# Step 1: Check current logs
kubectl logs <pod-name>

# Step 2: Check previous container logs (before crash)
kubectl logs <pod-name> --previous

# Step 3: Check events
kubectl describe pod <pod-name>

# Step 4: Check exit code
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

Exit code cheat sheet:

  • 0 → Clean exit (shouldn’t CrashLoop)
  • 1 → App error (check logs)
  • 137 → OOMKilled or SIGKILL (container killed — usually memory limit)
  • 139 → Segmentation fault
  • 143 → SIGTERM received (graceful shutdown signal)

Exit code 137 is extremely common. It means your container hit its memory limit and got killed. Fix: increase resources.limits.memory or fix a memory leak.


Health Probes — Your Lifecycle Control Panel

This is the most underused feature in Kubernetes, and it directly controls how your Pod behaves in the Running phase.

Startup Probe

For apps that take a long time to boot — legacy Java apps, apps loading large ML models, etc. It tells Kubernetes: “Don’t check liveness yet. Give this app time to start.”

startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 attempts
periodSeconds: 10 # every 10s = 5 minutes max startup time

Without this, a slow-starting app will get killed by the liveness probe before it even finishes booting. I’ve seen teams spend hours debugging random restarts on perfectly healthy apps — all because they were missing a startup probe.

Liveness Probe

Checks: “Is this container still alive and functional?”

If it fails, Kubernetes kills and restarts the container. Use this to detect deadlocks or hung processes that are technically “running” but doing nothing.

livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
failureThreshold: 3

Readiness Probe

Checks: “Is this container ready to accept traffic?”

If it fails, the Pod is removed from the Service’s endpoint list. No traffic reaches it. When it recovers, it gets added back automatically. This is how you achieve zero-downtime during rolling updates.

readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10

The rule of thumb: Always use all three probes in production. Missing readiness probe = users hit a Pod that isn’t ready = 502 errors during deployments.


Init Containers — The Setup Crew

Before your main application container starts, you can run init containers. They run sequentially, must exit with code 0, and only then does the main container start.

Practical use cases:

  • Wait for the database to be reachable before starting your app
  • Run database migrations
  • Fetch secrets or config files from an external source
  • Set correct file permissions on shared volumes
initContainers:
- name: wait-for-db
image: busybox:1.35
command:
- sh
- -c
- |
until nc -z postgres-service 5432; do
echo "Waiting for Postgres..."
sleep 3
done
echo "Postgres is up!"

Without this, your app might start, fail to connect to DB, and enter CrashLoopBackOff — when the real problem is just a startup ordering issue.


Graceful Termination — Don’t Kill Your App Rudely

When you delete a Pod or do a rolling update, Kubernetes doesn’t just yank the plug. It follows a proper shutdown sequence:

Pod deletion triggered (kubectl delete / rolling update)
|
v
Pod status → Terminating
Pod removed from Service Endpoints (stops getting new traffic)
|
v
preStop hook runs (if configured)
|
v
SIGTERM sent to containers ("please shut down gracefully")
|
v
Kubernetes waits... (terminationGracePeriodSeconds, default: 30s)
|
v
If container still running → SIGKILL ("you had your chance")
|
v
Pod removed from etcd

If your app doesn’t handle SIGTERM, it ignores the graceful shutdown signal and gets force-killed after 30 seconds. This means in-flight requests get dropped.

Fix it properly:

spec:
terminationGracePeriodSeconds: 60
containers:
- name: myapp
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]

The sleep 5 in preStop gives kube-proxy time to remove the Pod from load balancer rules before your app starts shutting down. This tiny trick eliminates a ton of 502 errors during deployments.


Pod Conditions — The Detailed Status

Beyond the 5 phases, Kubernetes tracks conditions — granular boolean flags:

kubectl get pod <pod-name> -o jsonpath='{.status.conditions}' | python3 -m json.tool
ConditionWhat it tells you
PodScheduled: TrueNode has been assigned
Initialized: TrueAll init containers finished
ContainersReady: TrueAll containers are running and healthy
Ready: TruePod is ready to serve traffic

If Ready is False but ContainersReady is True, your readiness probe is failing. That’s a specific, actionable clue — way more useful than just “Pod not working.”


Common Issues and How to Actually Fix Them

What you seeWhat it meansWhat to do
Pending (forever)No schedulable nodeCheck kubectl describe pod events
ImagePullBackOffWrong image name or no registry credentialsFix image tag or add imagePullSecrets
CrashLoopBackOffApp keeps crashingkubectl logs --previous, check exit code
OOMKilledMemory limit hitIncrease limits.memory or fix memory leak
Terminating (stuck)Finalizer blocking deletionManually patch out the finalizer
UnknownNode is offlineCheck node health, cordon and drain
0/1 ReadyReadiness probe failingCheck /ready endpoint, check probe config

Mistakes I See All the Time

1. No resource limits set

# Bad — Pod can eat all node memory
containers:
- name: app
image: myapp:latest

# Good
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"

Without limits, one noisy Pod can starve every other workload on the node.

2. Using latest tag in production

latest is not a version. It changes without warning. Always pin to a specific image digest or tag like myapp:v1.4.2.

3. Only one replica for critical services

If that one Pod’s node goes down, your service is down. Always run at least 2–3 replicas for anything production-facing.

4. Ignoring the graceful shutdown

Not handling SIGTERM in your app = dropped requests on every deployment. Handle it. It takes 5 minutes and saves a lot of production pain.

5. Setting initialDelaySeconds too low

If your app takes 20 seconds to start but your liveness probe fires after 5 seconds, Kubernetes kills it before it even finishes booting. Always measure your actual startup time and set probes accordingly.


Quick Reference: Pod Lifecycle Cheat Sheet

textPENDING     → Accepted by API server, waiting to be scheduled
RUNNING     → On a node, at least one container running
SUCCEEDED   → All containers completed with exit code 0
FAILED      → One or more containers exited with non-zero code
UNKNOWN     → Node communication lost, state can't be determined

Restart Policies:
Always      → Default, restarts on any exit
OnFailure   → Restarts only on failure (non-zero exit)
Never       → Never restarts

Health Probes:
Startup     → Gates liveness/readiness until app is ready
Liveness    → Kills
Primary Referencehttps://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
Source Credithttps://www.techopsexamples.com/p/kubernetes-pod-lifecycle-behind-the-scenes

Leave a Comment