Skip to main content

Demystifying Raft Consensus Algorithm

An intuitive explanation of the Raft consensus algorithm. Learn how distributed systems achieve agreement even when nodes fail.

Consensus algorithms are fundamental to distributed systems. Raft was designed to be understandable - and it is, once you break it down.

The Problem #

In a distributed system, we need multiple nodes to agree on a value, even when some nodes fail. This is harder than it sounds:

  • Nodes can crash at any time
  • Network messages can be delayed or lost
  • We need the system to keep working

Raft’s Approach #

Raft breaks consensus into three sub-problems:

  1. Leader Election: Choosing a single leader
  2. Log Replication: Leader accepts commands and replicates to followers
  3. Safety: Ensuring all nodes agree on the same log

Leader Election #

Raft uses a term-based system. Each term has at most one leader.

Term 1: Node A is leader
Term 2: Node A crashes, Node B becomes leader
Term 3: Node B continues as leader

Election Process #

  1. A follower times out waiting for heartbeat
  2. It becomes a candidate, increments term, votes for itself
  3. Requests votes from other nodes
  4. Wins if it gets majority of votes
type RequestVoteArgs struct {
    Term         int
    CandidateID  int
    LastLogIndex int
    LastLogTerm  int
}

func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    
    // Reject if candidate's term is old
    if args.Term < rf.currentTerm {
        reply.VoteGranted = false
        reply.Term = rf.currentTerm
        return
    }
    
    // Grant vote if we haven't voted and candidate's log is up-to-date
    if rf.votedFor == -1 || rf.votedFor == args.CandidateID {
        if rf.isLogUpToDate(args.LastLogIndex, args.LastLogTerm) {
            reply.VoteGranted = true
            rf.votedFor = args.CandidateID
        }
    }
}

Log Replication #

Once elected, the leader:

  1. Accepts client commands
  2. Appends to its log
  3. Sends AppendEntries RPCs to followers
  4. Commits entry when majority acknowledge
type LogEntry struct {
    Term    int
    Command interface{}
}

type AppendEntriesArgs struct {
    Term         int
    LeaderID     int
    PrevLogIndex int
    PrevLogTerm  int
    Entries      []LogEntry
    LeaderCommit int
}

The Commit Rule #

A log entry is committed when:

  1. It’s stored on a majority of servers
  2. At least one entry from the leader’s current term is committed

This ensures safety - once committed, an entry won’t be lost.

Handling Failures #

Leader Crash #

  • Followers detect missing heartbeats
  • New election begins
  • New leader has all committed entries (guaranteed by vote restriction)

Follower Crash #

  • Leader retries AppendEntries
  • When follower recovers, it catches up

Network Partition #

  • Minority partition can’t elect leader (needs majority)
  • Majority partition continues operating
  • After healing, minority catches up

Why Raft Works #

The key invariants:

  1. Election Safety: At most one leader per term
  2. Leader Append-Only: Leader never overwrites entries
  3. Log Matching: If two logs have entry with same index and term, all preceding entries are identical
  4. Leader Completeness: Committed entries appear in all future leaders’ logs

Real-World Implementations #

Raft is used in:

  • etcd: Kubernetes’ backing store
  • CockroachDB: Distributed SQL database
  • Consul: Service discovery
  • TiKV: Distributed key-value store

Key Takeaways #

  1. Raft prioritizes understandability
  2. Strong leader simplifies replication
  3. Randomized timeouts prevent split votes
  4. Safety is never compromised - only liveness during partitions

Understanding Raft gives you intuition for how modern distributed databases achieve consistency.