Kubernetes Pod Disruption Budget (PDB) — Complete Guide with Examples

Picture this: it’s a Tuesday afternoon. Your infra team is draining a node for routine maintenance. Looks straightforward. Then your monitoring goes red — every single replica of your payment service got evicted at the same time. Zero pods up. Full outage.

Nobody planned for that. Nobody set any guardrails. And that’s exactly the problem PDB solves.

Pod Disruption Budget is one of those Kubernetes features that most engineers know exists but never actually configure — until they get burned in production. This blog is about not getting burned.

What is a Pod Disruption Budget?

A Pod Disruption Budget (PDB) is a Kubernetes resource that tells the cluster: “No matter what you’re doing — draining nodes, rolling updates, cluster upgrades — you must keep at least X pods of this application running at all times.”

It’s a contract between you and the Kubernetes scheduler. You define the minimum availability. Kubernetes respects it during voluntary disruptions.

The keyword here is voluntary. Let’s get that distinction right first.

Voluntary vs Involuntary Disruptions

Not all disruptions are equal in Kubernetes. PDB only protects you from voluntary ones.

Voluntary Disruptions (PDB protects against these)

These are planned, intentional actions — usually triggered by a human or an automated system:

Node drain — kubectl drain node during maintenance or upgrades
Cluster upgrades — managed Kubernetes (GKE, EKS, AKS) upgrading node pools
Rolling deployments — Kubernetes evicting old Pods during a new release
Autoscaler scale-down — Cluster Autoscaler removing underutilized nodes
Manual pod eviction — someone running kubectl delete pod

Involuntary Disruptions (PDB does NOT protect against these)

These are unexpected failures — hardware crashes, kernel panics, OOM on the node:

Node hardware failure
Cloud provider taking down a VM
Kernel panic on a node
OOMKill at the node level

PDB can’t stop a node from suddenly dying. But it can make sure that when your team deliberately takes action on the cluster, your application doesn’t fall over completely.

The Two Ways to Define a PDB

There are exactly two fields you’ll use — and you pick only one of them per PDB:

1. `minAvailable` — Minimum pods that must stay up

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-service-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: payment-service

This says: “At any point during a disruption, at least 2 pods of payment-service must be running.”

If you have 3 replicas, Kubernetes can evict 1 at a time. If you have 2 replicas, Kubernetes cannot evict any — the node drain will block until a pod is rescheduled elsewhere.

2. `maxUnavailable` — Maximum pods that can be down at once

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: frontend-pdb
  namespace: production
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: frontend

This says: “You’re allowed to take down 1 pod at a time. No more.”

If you have 5 replicas, Kubernetes can evict 1, wait for it to reschedule, evict another, and so on. Nice and controlled.

Using Percentages Instead of Hard Numbers

Both fields accept percentages — which is smarter for workloads that autoscale:

spec:
  minAvailable: "70%"   # At least 70% of pods must be up

If you have 10 pods, at least 7 must remain. If autoscaling kicks in and you have 20 pods, at least 14 must remain. The math adjusts automatically. Always prefer percentages for scalable workloads.

A Real Production Scenario

Let me paint a realistic picture. You have a checkout API running with 3 replicas in production. No PDB configured.

Your platform team runs a node upgrade. The upgrade tool drains nodes. Node 1 has all 3 of your pods (it happens — Kubernetes scheduling isn’t always perfectly spread). All 3 get evicted simultaneously. New pods try to schedule on other nodes. Takes 45 seconds. Your checkout API is down for 45 seconds. Orders fail. On-call gets paged.

Now let’s add a PDB:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: checkout-api-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: checkout-api

Same scenario. Node drain starts. Kubernetes checks the PDB before evicting. Sees that evicting more than 1 pod would violate minAvailable: 2. Evicts 1 pod. Waits for it to reschedule on another node and become healthy. Then evicts the next. Checkout API stays up the entire time. Platform team finishes the upgrade. No incident, no page.

That’s the entire value of PDB in one scenario.

How PDB Works Under the Hood

When you run kubectl drain <node>, the drain process doesn’t directly kill pods. It calls the Eviction API for each pod. The Eviction API checks if evicting that pod would violate any active PDB.

Here’s the flow:

kubectl drain node-1
       |
       v
Eviction API called for each Pod on node-1
       |
       v
Kubernetes checks: does this eviction violate any PDB?
       |
    YES → Eviction denied (blocked), drain waits
    NO  → Pod evicted, moves on to next pod
       |
       v
Drain waits and retries until PDB allows eviction

The drain doesn’t fail — it just pauses. It keeps retrying. Once the evicted pod schedules and becomes ready on another node, the PDB condition is satisfied again, and the next pod can be evicted.

This is also why a node drain can sometimes appear to “hang” — a PDB is blocking it because there’s nowhere for the pods to go. If your cluster has no available capacity on other nodes, the drain will wait indefinitely.

Checking PDB Status

# See all PDBs in a namespace
kubectl get pdb -n production

# Detailed view
kubectl describe pdb payment-service-pdb -n production

The output shows something like:

Name:           payment-service-pdb
Namespace:      production
Min available:  2
Selector:       app=payment-service

Status:
    Allowed disruptions:  1
    Current:              3
    Desired:              2
    Total:                3

Key field — Allowed disruptions: This tells you right now, how many pods Kubernetes is allowed to evict. If this is 0, a drain will block. If it’s 1 or more, the drain can proceed one pod at a time.

When debugging a stuck node drain, always check:

kubectl get pdb -A

Look for any PDB with ALLOWED DISRUPTIONS: 0 — that’s your blocker.

Common Mistakes and How to Avoid Them

Mistake 1: Setting minAvailable equal to your replica count

# You have 3 replicas and set this:
spec:
  minAvailable: 3

This means zero disruptions are allowed. Ever. Your node drain will block permanently. The cluster upgrade will never complete. This is an accidental self-inflicted deadlock.

Fix: Always leave room for at least 1 disruption. If you have 3 replicas, set minAvailable: 2 or maxUnavailable: 1.

Mistake 2: PDB with a selector that matches zero pods

spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-service   # But the pods are actually labeled app: myservice

Typo in the label. PDB exists but matches nothing. It also doesn’t protect anything. Kubernetes won’t warn you — the PDB is valid YAML, just useless.

Fix: Always verify after creating a PDB:

bashkubectl describe pdb <name> -n <namespace>
# Check "Current" field — should match your replica count

If Current: 0, your selector is wrong.

Mistake 3: Only 1 replica with a PDB

spec:
  minAvailable: 1

With only 1 replica, this means the pod can never be evicted. Node drains will block permanently. You need at least 2 replicas for a PDB to be useful — one stays up while the other gets evicted.

Fix: For critical single-replica workloads, either accept the disruption risk, increase to 2+ replicas, or use minAvailable: 0 (which effectively disables protection but keeps the PDB object for future use).

Mistake 4: No PDB on stateful workloads

StatefulSets — databases, queues, cache clusters — are the most critical workloads to protect. A Postgres primary getting evicted during a node drain while no replica is ready is a data availability nightmare.

Always add PDBs to StatefulSets:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: production
spec:
  minAvailable: 2    # Keep majority of DB pods alive (for quorum)
  selector:
    matchLabels:
      app: postgres

Mistake 5: Forgetting PDB during Helm deployments

If your app is deployed via Helm, the PDB should be part of the chart — not a manually applied afterthought that gets forgotten. Add it to your Helm chart templates:

templates/
  deployment.yaml
  service.yaml
  pdb.yaml         ← add this
  hpa.yaml

And make it configurable:

# templates/pdb.yaml
{{- if .Values.pdb.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: {{ include "myapp.fullname" . }}-pdb
spec:
  minAvailable: {{ .Values.pdb.minAvailable }}
  selector:
    matchLabels:
      {{- include "myapp.selectorLabels" . | nindent 6 }}
{{- end }}

# values.yaml
pdb:
  enabled: true
  minAvailable: 2

PDB with HPA — The Right Combination

If you’re using Horizontal Pod Autoscaler (HPA), use percentages in your PDB instead of hard numbers:

spec:
  minAvailable: "70%"

Why? With HPA, your replica count fluctuates. If HPA scales you down to 2 pods and your PDB says minAvailable: 3, you’ve created a deadlock — Kubernetes can’t evict anything because you already have fewer pods than the minimum.

With minAvailable: "70%", when HPA scales to 2 pods, Kubernetes can still evict up to 1 pod (30%). When it scales to 10 pods, it can evict up to 3. The PDB adapts with the workload.

PDB for Different Workload Types

Here’s a quick reference for how to configure PDBs based on workload type:

Stateless Web Services / APIs

spec:
  maxUnavailable: 1    # One pod down at a time is fine

Critical Payment / Auth Services

spec:
  minAvailable: "80%"  # Keep most pods alive always

Databases / StatefulSets (Odd replicas for quorum)

spec:
  minAvailable: 2      # For 3-replica setup — never lose quorum

Background Workers / Queue Consumers

spec:
  maxUnavailable: 2    # Can afford more disruption, not user-facing

Single-replica Dev/Test workloads

spec:
  maxUnavailable: 1    # Allows disruption, no quorum concern

PDB + Node Affinity + Topology Spread — The Full Stack

PDB is only as effective as your pod scheduling strategy. If all 3 replicas land on the same node, a single node drain evicts all 3 — and the PDB just blocks the drain instead of preventing outage.

Pair PDB with topology spread constraints to ensure pods spread across nodes and availability zones:

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: payment-service
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: payment-service

This ensures pods spread evenly across nodes AND across availability zones. Combined with a PDB, node drains can only take down pods from one location at a time — your other zones keep serving traffic.

Quick Reference Cheatsheet

# Create a PDB
kubectl apply -f pdb.yaml

# List all PDBs
kubectl get pdb -A

# Check PDB status (most important: ALLOWED DISRUPTIONS)
kubectl describe pdb <name> -n <namespace>

# Debug a stuck node drain
kubectl get pdb -A | grep " 0 "   # PDBs with 0 allowed disruptions

# Force drain (bypasses PDB — use only in emergency)
kubectl drain <node> --disable-eviction --force --ignore-daemonsets

Warning on --disable-eviction --force: This bypasses PDB entirely. Use only in genuine emergencies — node is already dead, cluster is stuck. Never use this on healthy nodes as routine maintenance.

PDB vs Pod Anti-Affinity — What’s the Difference?

People sometimes confuse these two. They solve different problems:

	PDB	Pod Anti-Affinity
What it does	Controls how many pods can be disrupted at once	Controls where pods are scheduled
When it acts	During evictions / disruptions	During scheduling
Protects against	Too many pods going down simultaneously	All pods landing on same node
Use together?	Yes — they complement each other

Use both. Anti-affinity spreads your pods. PDB ensures they can’t all be taken down at once. Together, they make your workload genuinely resilient.

Wrapping Up

PDB is a small resource — maybe 10 lines of YAML — but it’s the difference between a smooth cluster upgrade and a production incident.

The key things to remember:

PDB protects against voluntary disruptions only (drains, upgrades, autoscaler scale-down)
Use minAvailable when you care about absolute minimum — use maxUnavailable when you care about disruption rate
Use percentages if you’re running HPA
Never set minAvailable equal to your total replica count — that’s a deadlock
Always verify the selector matches actual pods after creating a PDB
Pair PDB with topology spread constraints for real multi-zone resilience
Make PDB part of your Helm charts — not a manual step

The next time your platform team says “we’re draining nodes for the cluster upgrade tonight”, you’ll sleep easy knowing your PDB has it covered.

Reference Links

Topic	Link
Pod Disruption Budget — Official Docs	https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
Configure PDB	https://kubernetes.io/docs/tasks/run-application/configure-pdb/
Eviction API	https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/
Topology Spread Constraints	https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
Cluster Autoscaler & PDB	https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md