How to Set Up an etcd Cluster: Step-by-Step Guide for DevOps Engineers

Let me be real with you. When I first started working with Kubernetes seriously, I treated etcd like a black box. I knew it was there, I knew the control plane used it, and I knew I should not mess with it. But the moment one of our etcd nodes went down in a production cluster and we had no idea how to recover it cleanly, I realised I had been ignoring the most critical piece of the entire Kubernetes brain.

If you’re running Kubernetes in production and you haven’t taken the time to truly understand how etcd works and how to set it up properly, this post is for you.


What Even Is etcd? (The Short Version)

etcd is a distributed key-value store. In the Kubernetes world, it’s the place where the entire cluster state lives — every pod, every deployment, every secret, every config. When you run kubectl get pods, the API server is going to etcd and fetching that data.

Think of it like this: if Kubernetes is a city, etcd is the land records office. Every decision about what exists and what doesn’t passes through it.

So yes, losing etcd = losing everything.


Before You Start: What You Actually Need

To set up a proper etcd cluster (not just a single-node setup), you need at least 3 nodes. Always odd numbers — 3, 5, or 7. This is because etcd uses a consensus algorithm called Raft, and it needs a majority of nodes (called a quorum) to agree before committing any data.

  • With 3 nodes, you can handle 1 failure
  • With 5 nodes, you can handle 2 failures
  • With 7 nodes, you can handle 3 failures — but you also add more write latency

Most production setups are happy with 3 or 5 nodes.

For this guide, we’ll set up a 3-node etcd cluster on Ubuntu 22.04 as an external etcd setup (the production-grade way, separate from Kubernetes).

Each node:

NodeHostnameIP
Node 1etcd-01192.168.1.11
Node 2etcd-02192.168.1.12
Node 3etcd-03192.168.1.13

Ports needed: 2379 (client) and 2380 (peer) open between all nodes.


Step 1: Set Hostnames and Update /etc/hosts

Do this on all 3 nodes:

# On each node, set the respective hostname
hostnamectl set-hostname etcd-01 # change to etcd-02, etcd-03 on other nodes

Then add to /etc/hosts on all nodes:

192.168.1.11 etcd-01
192.168.1.12 etcd-02
192.168.1.13 etcd-03

Step 2: Download etcd Binaries

Run on all 3 nodes:

ETCD_VER=v3.5.12

curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
-o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

mkdir -p /tmp/etcd-download
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz \
-C /tmp/etcd-download --strip-components=1

sudo mv /tmp/etcd-download/etcd /usr/local/bin/
sudo mv /tmp/etcd-download/etcdctl /usr/local/bin/
sudo mv /tmp/etcd-download/etcdutl /usr/local/bin/

# Verify
etcd --version
etcdctl version

Step 3: Create etcd User and Directories

Never run etcd as root. Create a dedicated system user:

sudo useradd --system --no-create-home --shell /bin/false etcd

sudo mkdir -p /var/lib/etcd
sudo mkdir -p /etc/etcd

sudo chown -R etcd:etcd /var/lib/etcd
sudo chown -R etcd:etcd /etc/etcd

Step 4: Configure etcd on Each Node

Create /etc/etcd/etcd.conf on etcd-01:

name: etcd-01
data-dir: /var/lib/etcd

listen-client-urls: http://192.168.1.11:2379,http://127.0.0.1:2379
advertise-client-urls: http://192.168.1.11:2379

listen-peer-urls: http://192.168.1.11:2380
initial-advertise-peer-urls: http://192.168.1.11:2380

initial-cluster: etcd-01=http://192.168.1.11:2380,etcd-02=http://192.168.1.12:2380,etcd-03=http://192.168.1.13:2380
initial-cluster-state: new
initial-cluster-token: etcd-cluster-prod-01

log-level: info

On etcd-02 and etcd-03, use the same file but change name and the node-specific IP fields accordingly. The initial-cluster field stays the same on all 3 nodes.

Important: initial-cluster-state: new is only for the first bootstrap. Once the cluster is running, this should be existing on any node restart. Getting this wrong can cause a split-brain situation.


Step 5: Create a systemd Service File

On all 3 nodes, create /etc/systemd/system/etcd.service:

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --config-file /etc/etcd/etcd.conf
Restart=on-failure
RestartSec=5s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target

Start it on all 3 nodes within a short time of each other (they need to reach quorum together):

sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd

Step 6: Verify the Cluster is Healthy

From any node:

etcdctl --endpoints=http://192.168.1.11:2379,http://192.168.1.12:2379,http://192.168.1.13:2379 \
endpoint health

Expected output:

http://192.168.1.11:2379 is healthy: successfully committed proposal: took = 3.456ms
http://192.168.1.12:2379 is healthy: successfully committed proposal: took = 4.123ms
http://192.168.1.13:2379 is healthy: successfully committed proposal: took = 3.891ms

Check the current leader:

etcdctl --endpoints=http://192.168.1.11:2379,http://192.168.1.12:2379,http://192.168.1.13:2379 \
endpoint status --write-out=table

Step 7: Test Data Replication

# Write to etcd-01
etcdctl --endpoints=http://192.168.1.11:2379 put /test/hello "devops-world"

# Read from etcd-02
etcdctl --endpoints=http://192.168.1.12:2379 get /test/hello

If you see devops-world back from a different node — your cluster is replicating correctly.


Step 8: Enable TLS (Must in Production)

The plain HTTP setup is fine for a lab. In production, always use TLS for both client and peer communication. At a high level:

  1. Generate a CA using cfssl or openssl
  2. Create server certs per node (with node IP as SAN)
  3. Create a client cert for etcdctl and the Kubernetes API server
  4. Update config to use https:// and point to cert files
client-transport-security:
cert-file: /etc/etcd/certs/server.crt
key-file: /etc/etcd/certs/server.key
trusted-ca-file: /etc/etcd/certs/ca.crt
client-cert-auth: true

peer-transport-security:
cert-file: /etc/etcd/certs/peer.crt
key-file: /etc/etcd/certs/peer.key
trusted-ca-file: /etc/etcd/certs/ca.crt
peer-client-cert-auth: true

Full TLS walkthrough using cfssl is coming in the next post.


Backup etcd — Do This From Day One

# Take a snapshot
etcdctl --endpoints=http://192.168.1.11:2379 \
snapshot save /backup/etcd-snapshot-$(date +%Y%m%d%H%M).db

# Verify it
etcdutl snapshot status /backup/etcd-snapshot-<timestamp>.db --write-out=table

Automate with cron and ship to S3 or any object storage. If etcd goes down with no snapshot, the entire Kubernetes cluster state is gone. No second chances.

Common Issues and How to Fix Them

These are real problems you will hit at some point, not hypothetical ones.

Cluster not forming quorum after bootstrap
This is almost always a firewall issue. The nodes can’t reach each other on port 2380. Check it like this:

nc -zv 192.168.1.12 2380
nc -zv 192.168.1.13 2380

If it times out, open the ports in your firewall or security group and try again.

One node keeps crashing and restarting
Run journalctl -u etcd -f and read the logs carefully. The most common causes are:

  • Wrong IP in listen-peer-urls or initial-advertise-peer-urls
  • Leftover data in /var/lib/etcd from a previous failed bootstrap attempt
  • Hostname not resolving properly in /etc/hosts

If you’re starting fresh, it’s okay to delete the data dir and re-bootstrap — but never do this on a running cluster unless you know what you’re doing.

# Only on a node you are wiping and re-bootstrapping
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/*
sudo systemctl start etcd

Leader election keeps flapping
etcd uses disk writes to commit log entries. If your disk is slow (spinning HDD, overloaded node), the heartbeat timeouts start triggering and the cluster keeps re-electing a leader. Always use SSDs for etcd nodes in production. You can also tune the heartbeat and election timeout:

heartbeat-interval: 250    # in milliseconds (default: 100)
election-timeout: 1250 # in milliseconds (default: 1000)

Increase these if your nodes are geographically spread out or on high-latency networks.

NOSPACE alarm triggered
etcd has a default backend quota of 2 GB. Once you hit it, etcd goes into read-only mode and starts throwing etcdserver: mvcc: database space exceeded errors. Kubernetes will effectively freeze.

Check if an alarm is active:

etcdctl --endpoints=http://192.168.1.11:2379 alarm list

If you see NOSPACE, compact and defragment:

# Get the current revision
rev=$(etcdctl --endpoints=http://192.168.1.11:2379 endpoint status --write-out=json | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['Status']['header']['revision'])")

# Compact
etcdctl --endpoints=http://192.168.1.11:2379 compact $rev

# Defragment all nodes
etcdctl --endpoints=http://192.168.1.11:2379,http://192.168.1.12:2379,http://192.168.1.13:2379 defrag

# Disarm the alarm
etcdctl --endpoints=http://192.168.1.11:2379 alarm disarm

You can also increase the quota upfront in your config:

quota-backend-bytes: 8589934592   # 8 GB

Quick Reference: Key etcdctl Commands

CommandWhat it does
endpoint healthCheck health of all endpoints
endpoint status --write-out=tableSee leader, Raft term, DB size
put <key> <value>Write a key
get <key>Read a key
del <key>Delete a key
snapshot save <file>Take a backup
snapshot restore <file>Restore from a backup
member listList all cluster members
alarm listCheck for active alarms
defragDefragment the backend DB
compact <rev>Compact old revisions

How etcd Fits Into Your Kubernetes Control Plane

Just to close the loop on why all of this matters: when you run a kubeadm init or set up Kubernetes manually, the API server is configured to point to etcd using the --etcd-servers flag. Every object in Kubernetes — pods, services, namespaces, secrets, configmaps, RBAC rules — is stored as a key in etcd.

The keys look something like this:

/registry/pods/default/my-nginx-pod
/registry/secrets/kube-system/bootstrap-token-abcdef
/registry/deployments/production/api-deployment

When you do kubectl get pods, the API server reads from etcd. When you create a deployment, the API server writes to etcd. When the scheduler and controllers take action, they watch etcd for changes using a Watch API.

This is why etcd performance and availability directly translate into Kubernetes performance and availability. A slow etcd means a slow API server. A dead etcd means a dead cluster.


Monitoring etcd in Production

Once your cluster is up, don’t leave it unmonitored. etcd exposes Prometheus metrics out of the box on port 2381 (metrics endpoint).

# Add to your etcd config
listen-metrics-urls: http://0.0.0.0:2381

Key metrics to watch:

MetricWhy It Matters
etcd_server_is_leaderKnow which node is the current leader
etcd_server_proposals_failed_totalFailed Raft proposals — spikes mean trouble
etcd_disk_wal_fsync_duration_secondsDisk write latency — keep p99 under 10ms
etcd_disk_backend_commit_duration_secondsDB commit latency
etcd_network_peer_round_trip_time_secondsPeer-to-peer latency
etcd_mvcc_db_total_size_in_bytesDB size — watch before hitting quota

You can import the official etcd Grafana dashboard (ID: 3070) into your Grafana instance and connect it to your Prometheus scrape config. Takes about 5 minutes and gives you a solid production dashboard immediately.


Final Thoughts

etcd is not complicated once you understand two things: it uses Raft for consensus, and it is the single source of truth for your entire Kubernetes cluster. Set it up carelessly and you will pay for it sooner or later.

Start with 3 nodes, always use odd numbers, enable TLS before you go to production, set up automated backups from day one, and put monitoring in place before something breaks — not after.

Once your etcd cluster is running healthy, you’ll actually feel a lot more confident about your Kubernetes infrastructure overall. Because now you understand the foundation everything is built on. Most engineers operate at the kubectl layer and never look deeper. The ones who understand etcd, the API server, and the control plane deeply are the ones who can debug anything and build truly reliable systems.

The ops work doesn’t start at the Kubernetes level — it starts with etcd.


Found this useful? Share it with your team. Next post: Setting up etcd with full TLS using cfssl — the production-ready way — and how to connect it to a Kubernetes cluster as an external etcd setup.

Official Documentation


Kubernetes Official References


TLS & Security


Monitoring & Metrics

Leave a Comment