Suggested Posts

The challenge of consensus in distributed systems

The issue of consensus is one of the biggest challenges associated with distributed systems. To understand why, and how to meet this challenge, you first need to understand the nature of distributed systems.

A distributed ledger – also called a shared ledger, or distributed ledger technology (DLT) – is a series of replicated, shared and synchronised digital data, geographically spread across multiple sites, countries or institutions. In other words, there is no central administrator or centralised data storage. A peer-to-peer network is required to ensure replication across nodes (a network of computers). One form of distributed ledger design is the blockchain system.

Key to the operation of this distributed database is a mechanism to ensure the nodes verify the transactions, and agree with their order and existence on the ledger. This mechanism is called consensus. In the case of applications like a cryptocurrency, this process is critical to prevent double spending or other invalid data being written to the underlying ledger. So, why is consensus a challenge?

Rise of the multi-master database

When you have a single database, it’s called a master database. This is the original form of the database. But in a bid to address disaster recovery, failover and redundancy, we saw the rise of a second database – a copy of the master – whereby every write goes to the master, but the transactions are also written to the slave as well. If something goes wrong with the master, the slave could be ‘elevated’ to master status.

Following this, it became possible for applications to write to both databases in the network. This is known as a multi-master database. While this was a significant step forward, it introduced other problems. Unfortunately, when there are multiple applications in the same place across multiple databases, the applications will occasionally write to the same location in each database.

To ensure the databases stay in sync, this conflict must be resolved. But as you add more masters and more users to the database, these conflicts become more likely. Ultimately, it becomes increasingly difficult to remain in sync and ensure consensus.

Given the challenge of consensus, why move to a distributed system?

Today, most of the world’s information systems are centralised, and we have been forced to trust the companies that control them. Because large, centralised systems do not have to deal with the issue of consensus, they are efficient and scale well. On the other hand, a decentralised system has appeal as it removes the necessity of a single organisation controlling the data. A properly executed DLT removes the need for users of a system to trust a third party or each other, which is why they are often described as ‘trustless’.

To make this trustless system work, we’ve seen the introduction of many different consensus mechanisms, each with their pros and cons. They generally serve the same core purpose as described above, but differ in methodology. The primary difference between consensus mechanisms is the way in which they delegate and reward the verification of transactions.

The most popular blockchain consensus mechanisms are the Proof of Work (PoW) and Proof of Stake (PoS) systems. PoW systems are based on solving computationally intensive puzzles to validate transactions and create new blocks. A key feature of PoW systems is their asymmetry; the work must be moderately hard (but feasible) on the requester side, but easy to check for the service provider. PoS, on the other hand, is a type of algorithm whereby the creator of a new block is chosen via various combinations of random selection and wealth or age (i.e. the stake).

The Byzantine Generals’ Problem: A story of consensus

Imagine divisions of a Byzantine army, attacking a completely encircled city. To proceed, the generals of each division – who are dispersed around the city’s periphery – must agree on a battle plan. However, while some generals want to attack, others may want to retreat.

In order to achieve consensus, the commanding general and every lieutenant must agree on the same decision. To complicate matters, the generals are so far apart from each other that messengers are required in order for the generals to communicate. Also, one or more lieutenants may be a traitor, intending to sabotage the situation. In light of all this, is it possible for the army to carry out a strategy? The solution relies on an algorithm that can guarantee:

  1. All loyal lieutenant generals decide upon the same plan of action.

  2. A small number of traitors cannot cause the loyal lieutenants to adopt a bad plan.

The loyal lieutenants will all do what the algorithm says they should, but the traitors may do anything they wish. The algorithm must guarantee the first condition regardless of what the traitors do. The loyal lieutenants should not only reach an agreement but should agree upon a reasonable plan.

Why am I telling you this? Because the Byzantine Generals’ Problem is the analogy most often used to illustrate the requirement for consensus for distributed ledger technology. The nodes in the distributed system must all agree on a certain set of rules, and be able to move forward by agreeing on a particular assessment of a transaction before it is added to the database.

This is not easy, especially where thousands of nodes exist. In addition to that, each one must agree on the validity of new information to be added, thus preventing bad actors from sabotaging the ledger and rewriting history. A specific type of consensus algorithm must be adopted to achieve this, enabling the nodes to work together to update the ledger securely.

Consensus protocols and algorithms

Now that you have a general understanding of consensus, let me define two further concepts related to consensus –  the protocol and algorithms. These two concepts will help you understand how consensus is achieved, and the key components of any DLT implementation.

A protocol is a set of rules that govern how a system operates. The rules establish the basic functioning of the different parts, how they interact with each other, and what conditions are necessary for a robust implementation. The different parts of a protocol are not sensitive to order or chronology — it doesn’t matter which part goes first. A protocol also doesn’t tell the system how to produce a result. It doesn’t have an objective and doesn’t produce an output. In this sense, it works in the same way as a car engine.

Consensus algorithms, on the other hand, relate to the rules (mathematics) that each node follows to achieve consensus. These algorithms describe the steps that will need to take place. Unlike a consensus protocol, which is a set of rules that determine how the system achieves consensus, an algorithm is a set of instructions that produce an output or a result. It can be a simple script or a complicated program.

The order of the instructions is essential, and the algorithm specifies what that order is. It tells the system what to do in order to achieve the desired result. It may not know what the result is beforehand, but it knows that it wants one. If a consensus protocol can be likened to a car engine, then a consensus algorithm can be likened to the actions of the driver of the car. The protocol is. The algorithm does.

The four-step process of consensus

Consensus is a process. Broadly, it is designed to ensure transactions written to nodes across a network remain in sync, are immutable, and prevent the network from attack. To achieve this, the process of consensus follows four steps:

  1. Each node creates the transactions it wants to record.

  2. The data is shared between the nodes.

  3. Consensus is established on the order of valid transactions.

  4. Nodes update their transactions to reflect the consensus result.

The goal is to reach step four as quickly as possible without breaking consensus.

People also often claim that one kind of consensus is better than another. But there are different solutions for different situations. Overall, consensus is a process that facilitates synchronisation across a distributed network of untrusted nodes. In the future, this will allow us to build decentralised applications  –  either privately (for an enterprise) or publicly.

About the Author

Anthony Stevens is the founder and CEO of Digital Asset Ventures, a digital strategy and software development company. Digital Asset Ventures’ technology expertise is concentrated in three key areas: distributed ledger technology, artificial intelligence, and big data and data networks. Anthony is also the co-author of Chasing Digital: A Playbook for the New Economy (Wiley).

Related Posts

See All
No tags yet.

©2018-2019 by The GRC Institute - Governance, Risk & Compliance.  ABN: 42862119377