Flare Consensus Protocol Explainer

The Flare Consensus Protocol (FCP) is a new construction of Federated Byzantine Agreement (FBA) consensus. It is both leaderless and totally-ordered, making it excessively difficult for an attacker to influence which of two transactions will be ordered first in a transaction set. FCP is also asynchronous and much simpler than previous FBA constructions due to leveraging a novel federated consensus mechanism called Federated Virtual Voting (FVV). The aim of this post is to break down the Flare Consensus Protocol (FCP), as introduced in the whitepaper.

Consider a number of participants, who wish to perform transactions with one another. For instance, these could be financial transactions. Crucially, in order to maintain network integrity and prevent issues such as double spending, these transactions must be ordered. Traditionally, a centralised authority such as a bank keeps track of the transactions via a ledger. But, the participants want greater transparency, fairness and security, and thus wish to achieve transaction ordering in a distributed manner, without electing any central authority. Can this be achieved in an efficient and safe manner? The FCP tells us that yes, this can be achieved.

This is the question of reaching consensus in a distributed system. Two important properties are that of safety and liveness. Informally, safety guarantees that the participants reach a consistency in their decision making process, whilst liveness guarantees that this decision making process will terminate.

The protocol consists of the following steps. First, the participants select their quorum slices. Second, participants transact and sync via the sending and receiving of  messages. As transactions are received, each participant builds some knowledge of the transactions occurring, represented by a local Directed Acyclic Graph (DAG). Finally, Federated Virtual Voting is implemented on the local DAG. First, some background on FBA systems and the importance of quorums.

Federated Byzantine Agreement

Quorum slices

The FCP exists within the framework of Federated Byzantine Agreement (FBA). In Byzantine agreement, the goal is for the participants to achieve consensus in the presence of byzantines nodes i.e. participants behaving arbitrarily or maliciously. In particular, all the participants are required in the decision making process. In contrast, in FBA,  participants will make their decisions based on groups of people they trust. We think this is a great property because it reflects how people and businesses make decisions in the real world.

Each participant will thus have a list of different groups of participants they trust, known as quorum slices. Note that each participant furthermore belongs to each of their own quorum slices. More precisely, an FBA system is defined by a set of participants \(\mathbf{V}\) and a quorum slice function \(\mathbf{Q}\). This is a list of quorum slices for each participant.

Let's consider an example. Say we have \(4\) participants: Alice (\(A\)), Bob (\(B\)), Charlie (\(C\)) and Eve (\(E\)) forming an FBAS.

The list of participants \(\mathbf{V}\) is given by (\(A, B, C, E\)), and a list of quorum slices has to be specified for each participant. For example, if Bob deems Alice to be very trustworthy, then \((A,B)\) is one of Bob's quorum slices. If he trusts Charlie and Eve, needing both of them in order to be convinced, then \((B, C, E)\) would be another quorum slice for Bob.

Quorums

Now, consider a set of participants. If this set contains a slice for each of these participants, then it is said to be a quorum. This set is then sufficient for each member to be convinced by its output. This is termed a sufficient set.

In the following, the arrows represent a participant including another participant that it points to in its quorum slice function \(\mathbf{Q}(v)\) for \( v \in \{A, B, C, E \}\).

Participants Bob, Charlie and Eve each have a quorum consisting of participants \(\{B, C, E\}\). On the other hand, Alice only includes Bob and Charlie in her quorum slice, but they both rely on Eve in their quorum slice, and thus \( \mathbf{Q}(A)=\{A,B,C, E\} \).

What would happen if a participant trusted a malevolent actor? This could result in the network integrity being compromised. This can be captured via the notion of quorum intersection.

Quorum intersection

Two quorums are said to intersect if they have one (or more) participants in common. Intuitively, if there are no overlaps between quorums, then these distinct groups could agree on different statements, which is called a network divergence. Thus, an overlap can guarantee cohesiveness and ensure system safety. At the same time, if the overlap occurs at a malicious party, then they could send contradictory statements to different quorums. To resolve this, dispensable sets are introduced.

For example, in the case of three participants Alice, Bob and Charlie, with quorum slices given by the figure below, Bob is the sole participant belonging to both Alice's and Charlie's quorum slices. If Bob were malicious, he could simultaneously convince Alice of one statement and  Charlie of another. On the other hand, in the previous example of four participants, whether Alice is malicious or not does not affect the others, as she does not belong to their quorum slices nor their intersection.

Dispensable sets

Thus, ideally, once the set of malicious actors was deleted from the network, any two quorums would still have quorum intersection. More generally, such malicious sets are known as dispensable sets, or DSets.

v-blocking

A group of participants might all overlap with each of participant \(v\)'s slices. In this case, this set is said to be \(v\)-blocking. For example, consider the following FBAS, where only Bob's quorum slices are shown. Charlie belongs to each of Bob's quorum slices, and thus is said to be Bob-blocking.  Such \(v\)-blocking sets potentially threaten the liveness of the system. Indeed, if these behave maliciously, then Bob will be unable to come to a valid decision, and the system would then remain blocked, unable to achieve consensus.

Now that the participants have chosen their quorums, they are free to transact and communicate these via a gossip protocol.

The Gossip Protocol

In a gossip protocol, information is spread across a system via peer-to-peer communication. In a process analogous to the spread of epidemics, information is progressively disseminated across the entire system.

For instance, say Bob wants to send Alice 10USD. He creates a message, in which he writes down  the amount he wishes to send to Alice, as well as some additional information.

In a gossip protocol, each node will randomly sync with other nodes. Bob then randomly picks another participant in the network and sends this message to them. As other participants wish to make transactions they too will create messages and randomly send them out. In turn, this means participants receive messages. It is transaction time!

Initial Messages

More precisely, the first message created by each participant at the start of the protocol is called the initial message. In the case of Bob sending 10USD to Alice, the initial message contains three fields: the transaction amount and recipient (10USD to Alice), a hash of Bob's quorum slice (Bob, Charlie, Eve) as well as a Bob's signature. A hash is a way of authenticating information. Bob's initial message is referred to as \( m_B^{(1)}\), where the subscript indicated the creator (Bob) and the superscript the message (first).

As the gossip protocol is executed, members will send and receive messages, and then continue to create further messages to send and receive. These will contain additional fields containing information about the history, that is, previous messages received. So the question is… what other information is contained in the message which allows this? The answer is:  knowledge that Bob has received in the past about other transactions.

Messages

Bob now randomly picks another participant and sends his message to them. Say, he chooses Eve. Thus, Eve receives the message sent by Bob. Upon reception of this message, she must create a new message containing information about the message received, as well as information about the last message she herself created. The creation of this new message is also her chance to attach a transaction to this message load, which she then propagates via the gossip protocol.

More precisely, this new message, containing as usual the transaction amount and recipient, a hash of her quorum slice and her digital signature. But, as this is not the initial message she sends, it will furthermore contain two extra fields: a hash of the previous message she sent as well as a hash of Bob's message she just received. As this is the second message she sends, it will be written:  \(m_E^{(2)}\).


More formally, the message is a data structure labelled with five fields: a hash of each of its two parents,  a transaction, a hash of its creator's quorum as well as a digital signature. These five fields uniquely determine the message.

Global DAGs

Someone watching the participants would see them send and receive messages, each containing information about the sender's prior information. The creation and propagation of messages between participants is represented by a Directed Acyclic Graph (DAG).

What is a graph?

A directed graph consists of a set of vertices and a set of edges, which are arrows between two vertices. For example, this is an undirected graph with \(4\) vertices:

If the edges have a direction, then the graph is said to be directed:

If, furthermore, there are no cycles in the graph i.e. no loops, then it is a Directed Acyclic Graph (DAG):

Global DAG

The  global DAG represents the history of the messages communicated, received and created by each participant. This global graph tracks the spread of information between the participants over time. To go back to our example, we had four participants (Alice, Bob, Charlie and Eve). At the start, each creates their own initial message: \( m_A^{(1)}, m_B^{(1)}, m_C^{(1)}, m_E^{(1)}\). Before any message has been sent, the global graph is:

A global DAG example

Next, say Eve sends her initial message to Charlie and Charlie sends his initial message to Bob. This will result in Charlie creating a second message \(m_C^{(2)} \), and Bob creating a second message \(m_B^{(2)}\). Next, say Bob sends his second message \(m_B^{(2)} \) to Alice. Upon receipt, she creates a new message $m_A^{(2)}$. This example corresponds to the following DAG:

Some graph terminology

In a DAG, a path between two messages means that it is possible to go from one vertex to another. For example, here there is a path from \( m_C^{(1)} \) to \(m_A^{(2)}\), as highlighted in yellow:

Given a message, its ancestors are all messages in the DAG which have a path to it. For example, messages \( m_B^{(1)} \) and \( m_C^{(1)} \) are ancestors of \(m_B^{(2)}\).

Local DAGs

Until now, we have been watching as Alice, Bob, Charlie and Eve exchange messages. But, what information would an individual participant be aware of? The answer is: only information contained within the messages they have received. As say Eve receives and sends messages, she keeps track of the information contained in these. She can represent this using a DAG which reflects her  current state of knowledge. This is called the local DAG.

For example, say Bob sends his initial message to Eve and Charlie sends his initial message to Alice. This results in Eve and Alice creating a second message \(m_E^{(2)}\) and \(m_A^{(2)}\). This corresponds to the following global DAG

But, upon creation of her second message, Alice only has knowledge of her prior message \(m_A^{(1)}\), as well as the information contained within \(m_C^{(1)}\). At this point, Alice's knowledge corresponds to the local DAG:

Similarly, Eve's state of knowledge is represented by:

Participant behaviour

Until now, we have assumed that the participants are well-meaning, obeying the rules and doing their best to make good decisions. In reality, this will not necessarily be the case. Indeed, a key problem with collaboration is the presence of malicious actors, wishing to undermine the process for their own reasons (such as an incapacity to follow the protocol or an attempt to take advantage and trick others for their own benefit).

An example of such behaviour is a fork. Here, upon receipt of a message, instead of creating a single new message, the participant could create two, neither of which contain information on the other. In one, he might say he is sending 200USD to Alice and in the other 300USD to Bob, which could lead to the malicious behaviour of double-spending. This scenario is represented by the DAG:

DAG Consistency

As the protocol proceeds, each participant builds their own local DAG representing their state of the knowledge. Given two local DAGs,  a message is said to be contained in both graphs if it has the same five fields. In this case, the two DAGs are said to be consistent if, for this message, both graphs contain all the same ancestors and edges between them.

DAG Connectivity

As time progresses, some messages - by virtue of being encoded in the history of the new message - will be more widely shared throughout the network. Once all participants are aware of these, they are global knowledge. This is captured by the idea of reachability.

Reachability

Let's consider the \(k\)th message of some participant \(x\) i.e. \(m_x^{(k)}\) and  the \(n\)th message of some participant \(y\) i.e. \(m_y^{(n)}\).

A message \(m_x^{(k)}\) is said to be reachable by a message \(m_y^{(n)}\) if there exists a path from \(m_y^{(n)}\) to \(m_x^{(k)}\) and none of the ancestors of \(m_x^{(k)}\) are forks of \(y\).

For example, in the following DAG, message \(m_E^{(4)}\) is reachable from \(m_E^{(1)}\) via the path highlighted in yellow.  

Strong reachability

By virtue of existing within the framework of FBA, the importance is placed on the message being shared not amongst any participant, but only the important ones, as captured by flexible trust and quorum slices. To capture this idea, strong reachability is introduced.

A message \(m_x^{(k)}\) is said to be strongly reachable by a message \(m_y^{(n)}\) if it is reachable from \(m_y^{(n)}\), and furthermore, there exists a quorum of \(x\) such that all the messages in this quorum can be reached by \(m_y^{(n)}\) and reach \(m_x^{(k)}\).

For instance, consider the following DAG. Message \(m_E^{(4) }\) is strongly reachable from message \(m_E^{(1) }\). Indeed, as previously seen, \(m_E^{(4) }\) is reachable from \(m_E^{(1) }\). Next, a quorum for Eve consists of herself, Bob and Charlie. First, consider message  \(m_E^{(2) }\), which is reachable from \(m_E^{(1) }\) and can reach \(m_E^{(4) }\). Next, consider message  \(m_B^{(2) }\),  which is reachable from \(m_E^{(1) }\) and can reach \(m_E^{(4) }\), as highlighted in green. Finally, consider message  \(m_C^{(3) }\),  which is reachable from \(m_E^{(1) }\) and can reach \(m_E^{(4) }\), as once again highlighted in green. Thus, all the conditions are satisfied for strong reachability.

Slot numbers and core messages

To recap: the gossip protocol was introduced, and each participant now has a local DAG representing their knowledge of messages which have been previously sent and received. We now return to the original goal: ordering the messages contained in these local DAGs. It is crucial that each participant ends up with the same ordering of messages. This is referred to as consensus. To achieve this, participants first compute the slot number of the messages contained in their DAG, and then determine which of the messages are core messages.

Slot number

The first step is for each participant \(x\) to determine the slot number of each message  \(m\), which is written \( \sigma_x[m]\). For now, we drop the \(x\) notation specifying the user. The first messages created have slot number one i.e. \(\sigma[m_z^{(1)}]=1\) for any participant \(z\). But, what will be the slot number of new messages?

As new messages are created, these are by default given the maximum slot number of its ancestors. For example, let's consider some message \(m_z^{(k)}\) and say that all of its ancestors have slot number one. So, the maximum slot number of its ancestors is one.

Now, consider the participants in one of \(z\)'s quorums. If each of their initial messages can strongly reach \(m_z^{(k)}\), then its slot number increases by one i.e.  \(\sigma[m_z^{(k)}]=2\). There must exist at least one quorum where this is the case for the slot number to increase. More generally, a message's slot number increases by one if it is strongly reachable from a quorum of messages in slot \(s\), where \(s\) is the maximum slot number of its ancestors. A message's slot number thus reflects how widely it has been shared throughout the network and is common knowledge. A slot corresponds to all messages with the same slot number.

Core message

The first messages of each slot are of particular importance, and are thus called core messages. If a message is a core message and is furthermore not a fork, it is said to be a unique core message.

Worked example

For example, consider the following DAG, and in particular, let's look at Eve's messages.

The initial message \(m_E^{(1)}\) has slot number \(1\) and is the very first core message. Next, we have to compute the slot number of  her second message \(m_E^{(2)}\). Its ancestors are \(m_E^{(1)}\)  and \(m_C^{(1)}\), both of which have slot number \(1\). Thus, the maximum slot number of its ancestors is \(1\). Now, to determine whether this should be incremented by \(1\), we need to check whether \(m_E^{(2)}\) is strongly reachable from a quorum of Eve's core messages in slot \(1\). Recall that Eve's quorum is given by Bob, Charlie and herself. Thus, \(m_E^{(2)}\) would have to be strongly reachable from \(m_B^{(1)}\), \(m_C^{(1)}\) and \(m_E^{(1)}\). As \(m_E^{(2)}\) is not even reachable from \(m_B^{(1)}\), it can not be strongly reachable and thus its slot number remains \(1\).

A similar argument applies to \(m_E^{(3)}\), and it is easy to figure out that its slot number is still \(1\). Now, consider  \(m_E^{(4)}\). Previously, we saw that  message \(m_E^{(4)}\) is strongly reachable from \(m_E^{(1)}\).

Next, consider core message \(m_C^{(1)}\). Message \(m_E^{(4)}\) is strongly reachable from \(m_C^{(1)}\).

Finally, consider message \(m_B^{(1)}\). Message \(m_E^{(4)}\) is strongly reachable from \(m_B^{(1)}\).

Thus, message \(m_E^{(4)}\) is strongly reachable from core messages from participants forming a quorum for Eve. As the maximum slot number of its ancestors is \(1\), then the slot number of \(m_E^{(4)}\) is now \(2\), and it is furthermore a core message.

The same process is applied to all messages in order to determine their slot numbers. In the following, messages with a dot are core messages.

Thus, now that each participant has created a local DAG and attributed a slot number to each message, it is time to implement federated virtual voting.

Federated Virtual Voting

Each participant has constructed their own local DAG, where each message has an associated slot number and where some messages have been determined to be core messages. Now, the procedure of federated virtual voting (FVV) is implemented.

Election

FVV consists of core messages 'voting' on the connectivity of previous core messages.  This is called an election. For instance, say some message \(m_y\) is voting regarding the connectivity of some message \(m_x\) from a previous slot. An election refers to the evaluation of the vote function i.e. the outcome of one core message casting a vote on a core message from a previous slot.

Voting

'Voting' is done by core messages evaluating a vote function on other core messages, which is henceforth referred to as voting. The outcome of the voting is binary i.e. yes  (\(1\)) or no (\(0\)). For example, let's consider the following DAG, where core messages are denoted by a dot.

Now, let's consider the first message created by Eve \(m_E^{(1)}\), and let's see how would other core messages vote on it.

Slot difference

The vote of some message regarding message \(m_E^{(1)}\) depends on the slot difference between the two messages. First, only core messages with slot number greater than \(m_E^{(1)}\) can vote on it.

For example, message \(m_E^{(4)}\) has slot number \(2\). The slot difference between the two messages, written as \(\Delta(m_E^{(1)}, m_E^{(4)} ) \), is  \(1\):  \(\Delta(m_E^{(1)}, m_E^{(4)} )= \sigma[m_E^{(4)}] - \sigma[m_E^{(1)}]=1\). On the other hand, core message \( m_E^{(7)}\) has slot number \(3\), and thus here the slot difference is \(2\): \(\Delta(m_E^{(1)}, m_E^{(7)} )= \sigma[m_E^{(7)}] - \sigma[m_E^{(1)}]=2\).

The slot difference is important as the vote depends on this.

Voting for a slot difference of 1

The voting rule for messages with a slot difference of \(1\) is as follows: if the two messages are in consecutive slots (e.g. \(\sigma[m_y]=s+1\) and \(\sigma[m_x]=s\) ), then \(m_y\) votes yes (\(1\)) if it is reachable from \(m_x\) and no (\(0\)) if it is not.

For example, let's consider core message \(m_B^{(4)}\) from slot \(2\). It is easy to see that message \(m_B^{(4)}\) is reachable from \(m_E^{(1)}\), and thus message \(m_B^{(4)}\) would vote yes. This is written as \(v(m_E^{(1)}, m_B^{(4)})=1\). In this instance, all of the core messages in slot \(2\) would vote yes on \(m_E^{(1)}\).

Voting for a slot difference greater than 1

Now, let's consider a core messages in slot \(3\), say message \(m_E^{(7)}\), voting on \(m_E^{(1)}\). Here, the slot difference is \(2\): \(\Delta(m_E^{(1)}, m_E^{(7)} )= \sigma[m_E^{(7)}] - \sigma[m_E^{(1)}]=2 \). In this case, votes from the previous slot, in this case in slot \(2\), are tallied up.

To do so, the voter (Eve) considers core messages from one of its quorums in the previous slot which can strongly reach it. The message \(m_E^{(7)}\)  considers how these messages have voted regarding message \(m_E^{(1)}\). This set of messages is called the voter set for \(m_E^{(7)}\).

Voter set

For example, here the voter set for message \(m_E^{(7)}\) is: \(m_B^{(4)}\), \(m_C^{(4)}\) and  \(m_E^{(4)}\). This voter set is now partitioned into those who voted yes regarding \(m_E^{(1)}\) and those who voted no. In this case, they all voted yes.  

If some had voted yes and others no, then the voter set would be spit into two: those who voted yes and those who voted no. Let's call those who voted no \(V_{\text{no}}\) and those who votes yes \(V_{\text{yes}}\). Next, we consider the creators of these messages. Say  \(P_{\text{no}}\) are the set of creators of messages in \(V_{\text{no}}\), and \(P_{\text{yes}}\) for \(V_{\text{yes}}\). In our example, we have that  \( V_{\text{yes}}=\{ m_B^{(4)}, m_C^{(4)}, m_E^{(4)}\}\) and so \(P_{\text{yes}}={B, C, E}\).

Tallying up votes and pivots

Finally, we can tally up the votes regarding message \(m_E^{(1)}\) as follows. First, we check whether \(P_{\text{yes}}\) is a quorum for Eve. If it is, then the votes are tallied up to a yes and thus message \(m_E^{(7)}\) votes yes regarding \(m_E^{(1)}\). Furthermore the message \(m_E^{(1)}\) is determined to be a pivot.

If this had not been the case, then we would have considered whether \(P_{\text{no}}\) was a quorum for Eve. If it was the case, then the votes would have been tallied up to a no, and the message \(m_E^{(1)}\) is determined to not be a pivot.

Finally, note that every \(c^{th}\) slot of deliberation is random, that is, the votes are made randomly.

Order Number

Now, everything has been set up in order for messages i.e. transactions in this example, to be ordered. This is achieved by computing the message order number, which utilises the previously computed pivots, i.e. messages which have been widely circulated to trusted sets. The above pivot construction can be shown to guarantee that all the  participants will come to the same conclusion regarding whether a given message is a pivot or not.

The order number of a message can be computed only when a message satisfies certain conditions. Say message \(m_x\) in slot \(s\) is considered. Pick the earliest slot where \(m_x\)'s descendent core messages have been decided as pivots or not, i.e. say slot \(s'\). Then the order number of message \(m_x\) is said to be \(s'\).

Consensus


Finally, a causal order is attributed via a deterministic process. This allows for messages to be sorted first by order number, then causal order and finally by hash if necessary.

Subscribe to the Flare Blog

* indicates required