Demystifying Raft Consensus Algorithm

Table of Contents

Consensus algorithms are fundamental to distributed systems. Raft was designed to be understandable - and it is, once you break it down.

The Problem #

In a distributed system, we need multiple nodes to agree on a value, even when some nodes fail. This is harder than it sounds:

Nodes can crash at any time
Network messages can be delayed or lost
We need the system to keep working

Raft’s Approach #

Raft breaks consensus into three sub-problems:

Leader Election: Choosing a single leader
Log Replication: Leader accepts commands and replicates to followers
Safety: Ensuring all nodes agree on the same log

Leader Election #

Raft uses a term-based system. Each term has at most one leader.

Term 1: Node A is leader
Term 2: Node A crashes, Node B becomes leader
Term 3: Node B continues as leader

Election Process #

A follower times out waiting for heartbeat
It becomes a candidate, increments term, votes for itself
Requests votes from other nodes
Wins if it gets majority of votes

type RequestVoteArgs struct {
    Term         int
    CandidateID  int
    LastLogIndex int
    LastLogTerm  int
}

func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    
    // Reject if candidate's term is old
    if args.Term < rf.currentTerm {
        reply.VoteGranted = false
        reply.Term = rf.currentTerm
        return
    }
    
    // Grant vote if we haven't voted and candidate's log is up-to-date
    if rf.votedFor == -1 || rf.votedFor == args.CandidateID {
        if rf.isLogUpToDate(args.LastLogIndex, args.LastLogTerm) {
            reply.VoteGranted = true
            rf.votedFor = args.CandidateID
        }
    }
}

Log Replication #

Once elected, the leader:

Accepts client commands
Appends to its log
Sends AppendEntries RPCs to followers
Commits entry when majority acknowledge

type LogEntry struct {
    Term    int
    Command interface{}
}

type AppendEntriesArgs struct {
    Term         int
    LeaderID     int
    PrevLogIndex int
    PrevLogTerm  int
    Entries      []LogEntry
    LeaderCommit int
}

The Commit Rule #

A log entry is committed when:

It’s stored on a majority of servers
At least one entry from the leader’s current term is committed

This ensures safety - once committed, an entry won’t be lost.

Handling Failures #

Leader Crash #

Followers detect missing heartbeats
New election begins
New leader has all committed entries (guaranteed by vote restriction)

Follower Crash #

Leader retries AppendEntries
When follower recovers, it catches up

Network Partition #

Minority partition can’t elect leader (needs majority)
Majority partition continues operating
After healing, minority catches up

Why Raft Works #

The key invariants:

Election Safety: At most one leader per term
Leader Append-Only: Leader never overwrites entries
Log Matching: If two logs have entry with same index and term, all preceding entries are identical
Leader Completeness: Committed entries appear in all future leaders’ logs

Real-World Implementations #

Raft is used in:

etcd: Kubernetes’ backing store
CockroachDB: Distributed SQL database
Consul: Service discovery
TiKV: Distributed key-value store

Key Takeaways #

Raft prioritizes understandability
Strong leader simplifies replication
Randomized timeouts prevent split votes
Safety is never compromised - only liveness during partitions

Understanding Raft gives you intuition for how modern distributed databases achieve consistency.