Demystifying Raft Consensus Algorithm
An intuitive explanation of the Raft consensus algorithm. Learn how distributed systems achieve agreement even when nodes fail.
Table of Contents
Consensus algorithms are fundamental to distributed systems. Raft was designed to be understandable - and it is, once you break it down.
The Problem #
In a distributed system, we need multiple nodes to agree on a value, even when some nodes fail. This is harder than it sounds:
- Nodes can crash at any time
- Network messages can be delayed or lost
- We need the system to keep working
Raft’s Approach #
Raft breaks consensus into three sub-problems:
- Leader Election: Choosing a single leader
- Log Replication: Leader accepts commands and replicates to followers
- Safety: Ensuring all nodes agree on the same log
Leader Election #
Raft uses a term-based system. Each term has at most one leader.
Term 1: Node A is leader
Term 2: Node A crashes, Node B becomes leader
Term 3: Node B continues as leader
Election Process #
- A follower times out waiting for heartbeat
- It becomes a candidate, increments term, votes for itself
- Requests votes from other nodes
- Wins if it gets majority of votes
type RequestVoteArgs struct {
Term int
CandidateID int
LastLogIndex int
LastLogTerm int
}
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
// Reject if candidate's term is old
if args.Term < rf.currentTerm {
reply.VoteGranted = false
reply.Term = rf.currentTerm
return
}
// Grant vote if we haven't voted and candidate's log is up-to-date
if rf.votedFor == -1 || rf.votedFor == args.CandidateID {
if rf.isLogUpToDate(args.LastLogIndex, args.LastLogTerm) {
reply.VoteGranted = true
rf.votedFor = args.CandidateID
}
}
}
Log Replication #
Once elected, the leader:
- Accepts client commands
- Appends to its log
- Sends AppendEntries RPCs to followers
- Commits entry when majority acknowledge
type LogEntry struct {
Term int
Command interface{}
}
type AppendEntriesArgs struct {
Term int
LeaderID int
PrevLogIndex int
PrevLogTerm int
Entries []LogEntry
LeaderCommit int
}
The Commit Rule #
A log entry is committed when:
- It’s stored on a majority of servers
- At least one entry from the leader’s current term is committed
This ensures safety - once committed, an entry won’t be lost.
Handling Failures #
Leader Crash #
- Followers detect missing heartbeats
- New election begins
- New leader has all committed entries (guaranteed by vote restriction)
Follower Crash #
- Leader retries AppendEntries
- When follower recovers, it catches up
Network Partition #
- Minority partition can’t elect leader (needs majority)
- Majority partition continues operating
- After healing, minority catches up
Why Raft Works #
The key invariants:
- Election Safety: At most one leader per term
- Leader Append-Only: Leader never overwrites entries
- Log Matching: If two logs have entry with same index and term, all preceding entries are identical
- Leader Completeness: Committed entries appear in all future leaders’ logs
Real-World Implementations #
Raft is used in:
- etcd: Kubernetes’ backing store
- CockroachDB: Distributed SQL database
- Consul: Service discovery
- TiKV: Distributed key-value store
Key Takeaways #
- Raft prioritizes understandability
- Strong leader simplifies replication
- Randomized timeouts prevent split votes
- Safety is never compromised - only liveness during partitions
Understanding Raft gives you intuition for how modern distributed databases achieve consistency.