The interplay of Nakamoto Consensus and Byzantine Fault Tolerance

Trying to understand the interplay of BFT and NC is not easy. Reading the original papers without a guide is challenging because BFT was invented prior to NC, and, the authors of NC were likely unaware of BFT research. To my knowledge, there is not a rigorous comparison of the design tradeoffs.

This basic primer should help the reader understand the design trade-offs between NC and BFT algorithms. Reading this first will help provide context before tackling the BFT and NC papers listed at the end of this article.

Looking forward, it’s important to understand these trade-offs because most next generation blockchains are moving away from NC towards new BFT-based consensus algorithms.

SIMILARITIES AND DIFFERENCES

BFT and NC are not directly comparable. BFT describes a family of scenarios, while NC is a specific algorithm. The scenario NC is built for had not been previously posed by BFT related research, and in fact, the Bitcoin whitepaper does not reference any BFT research among its many citations. That said, it’s possible to consider NC in the context of BFT by looking at some of the similarities and differences.

Similarities:

  • Both replicate state machines to increase reliability and availability, and accept inefficiency as a trade-off.
  • Byzantine Behavior: Both assume nodes can attempt to say different things to different peer nodes, and have mechanisms to detect this behavior.

Differences:

  • Network: NC assumes a synchronous network. BFT considers both synchronous and asynchronous networks, but usually focuses on the harder problem of asynchronous networks.
  • Permissioning: NC does not assume nodes are known in advance, and allows any node to join without permission (i.e. permissionless). BFT does assume nodes are known in advance, requiring some kind of permissioning.
  • Consistency: NC allows temporary inconsistency about the state of the system. BFT algorithms are always consistent (as long as assumptions are met).
  • Durability: NC allows historical data to be rewritten under a certain attack called a reorganization. In BFT, historical data does not change.
  • Voting: NC relies on basic majority rules: at least half of the nodes (or hash power or stake) being honest. BFT relies on a super majority of at least two thirds of the nodes being honest.

Let’s unpack this a bit.

REPLICATION FOR RELIABILITY AND AVAILABILITY:

A common method to increase reliability is to rely on redundancy. A car normally has four brakes, for example. Even if one brake fails, the other three can safely stop the car.

Similarly, availability is also boosted by redundancy. A traffic intersection typically has at least two traffic lights, in case one is obscured.

When you access a web service like gmail, there is typically around 9x redundancy to provide high levels of reliability and availability.

Redundancy is by definition inefficient. If a system only needs 1 machine but keeps 99 spares around, then it’s only 1% efficient.

Both NC and BFT depend on replication to increase reliability and availability, and accept the resulting inefficiency. NC did so because it wanted to secure a very valuable asset, Bitcoin, and BFT’s original research accepted inefficiency because it was designed for scenarios where mistakes were unacceptable, such as weapon systems.

BYZANTINE BEHAVIOUR:

NC and BFT both assume that among the many replicated nodes, some could malfunction by either stopping, or attempting to say different things to different nodes. The former is called stop-fail behavior, the latter is called byzantine behavior.

Both BFT and NC deal with byzantine nodes.

  • Most algorithms in the BFT family uses signed messages to detect byzantine nodes. If a node tries to say different things to different nodes, those nodes can compare notes to identify the faulty node.
  • Nakamoto Consensus creates a random delay so that any one node can effectively only say one thing at a time. Nodes are effectively prevented from exhibiting byzantine behavior. This delay is created by the well known Proof of Work.

NETWORK:

A network may be able to deliver all messages between nodes within a fixed time (synchronous), or not (asynchronous).

  • The former is comparable to being in the same room with someone. Within the bounds of the speed of sound, if someone talks, you’ll hear it.
  • The latter is like snail mail. It’s hard to predict when a letter will be received. The Internet and most computer systems are usually synchronous, but, scenarios can pop up that resemble asynchronous behavior.

The problem with an asynchronous is that not hearing from a node could mean two things:

  1. Either the node isn’t talking, or,
  2. The node is talking but you just haven’t heard it yet.

NC does not specifically talk about network synchrony, but, it is clear that the author assumed that data would travel throughout the network quickly, relative to block production time (10 minutes). In other words, NC assumes a synchronous network.

Most algorithms in the BFT family assume an asynchronous network, including, OM and SM (the first BFT algorithms), PBFT (used by Cosmos), Algorand, Honeybadger, and HotStuff (used by Facebook Libra).

PERMISSIONING AND CONSISTENCY

A system can either have a fixed number of nodes (e.g. 4 brakes in your car) or allow any node to join (e.g. connecting your computer to the Internet).

NC assumes any node can join the system at any time without telling anyone. This is permissionless because no one has the authority to grant permission to join the system.

BFT thinking, on the other hand, assumes nodes are known in advance. After NC’s invention, known nodes is called either permissioned if the nodes are allowed in by a central authority, or, another name like delegated proof of stake if the nodes are voted in.

Permissioning is intimately related to consistency.

Consistency means that data doesn’t change unless it’s supposed to.

A known set of nodes running a good BFT algorithm within defined parameters can ensure consistency. NC, on the other hand, does allow the possibility that the latest data you see may soon change. This is called a temporary fork. When two miners generate two valid blocks at the same height, only one can be valid. In NC, one of these blocks will eventually be orphaned. If you were relying on that data in the orphaned block, you’d be in trouble. And there’s an even deeper attack, the reorg, which allows very old data to be rewritten.

NC’s permissionless model introduces another attack vector: the Sybil attack. In a Sybil attack, a single adversary creates multiple fake accounts to stuff the ballot box. Solving for this is the second use of PoW. Not only does PoW prevent two nodes from talking at the same time, it also prevent a single node from easily saying two things at once. To say two things, an attacker would need to spend two times the money on hardware and electricity to perform work. It’s very elegant that PoW solves both Sybil Atacks and Byzantine Behavior.

By comparison, BFT does not suffer from Sybil attacks because the problem of identifying the permissioned nodes is solved outside the system, such as by a central authority.

CONSENSUS: MAJORITY / SUPERMAJORITY

In both BFT and NC, a majority or supermajority is required to achieve consensus.

BFT has a classic result that more than 2/3 of the nodes must be honest for the system to continue working. This is often cited. However, the 2/3 result is actually correct for two different scenarios.

First, oral messages (OM), which are forgeable and allow a node to talk out of two sides of its mouth, within a synchronous network, requires more than 2/3 honest nodes. The result here is actually quite unintuitive and therefore valuable — we generally think of a simple majority as enough to achieve consensus, not a supermajority. The OM scenario is not considered much today because good public key encryption exists.

Second, signed messages (SM) within an asynchronous network, requires more than 2/3 honest nodes. Signed messages use public key encryption to prevent forgeries by making it easy to detect when a node is talking out of two sides of its mouth. But the fact that not hearing from a node could mean they aren’t talking, or could also mean the are talking but it’s taking a while, makes this scenario difficult to solve. This SM scenario more closely resembles the actual conditions of the Internet today, so this is where research focuses.

There’s actually a third scenario: signed messages within a synchronous network. This is a much easier problem and reverts to simple majorities. There’s been recent research in this area as well.

FINAL THOUGHTS

NC made very different assumptions from BFT thinking and ultimately led to a new wave of innovation. It’s reasonable to call the problem NC solves a part of the BFT family, although it is at best a long lost cousin, rather than direct descendant.

Most new blockchains are applying BFT scenarios that are probably stronger than necessary. Bitcoin and Ethereum have worked successfully assuming a synchronous network, allowing a simple majority, whereas most next-gen blockchains are robust under asynchronous networks, with the tradeoff that they require more than 2/3 of the nodes to be honest, as well as a more complex consensus algorithm.

Don’t confuse consensus with performance. Replication by nature is inefficient. When a blockchain advertises higher transactions per second (TPS), it’s almost always based less on the core consensus algorithm, and more on design choices like the number of nodes, network speed, and storage capabilities.

Thanks to Evan Mair for review.

The original Byzantine Fault Tolerance paper (1982): https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf

The original Bitcoin / Nakamoto Consensus paper (2008): https://bitcoin.org/bitcoin.pdf

The original Practical Byzantine Fault Tolerance paper, exploring asynchronous networks (1999): http://pmg.csail.mit.edu/papers/osdi99.pdf

The original Sybil Attack paper (2002): https://www.freehaven.net/anonbib/cache/sybil.pdf

Recent research on signed messages within a synchronous network: https://eprint.iacr.org/2018/1028.pdf

A great article about the history of BFT and NC: https://medium.com/s/story/lets-take-a-crack-at-understanding-distributed-consensus-dad23d0dc95

Source: Crypto New Media

Close

Request For My Information

 
Close

Request For Account Deletion

Close

Request For Information Deletion

Close

General Request / Query To DPO