Sui Temporary Total Network Shutdown Bugfix Review

On July 15th, the whitehat @F4lt responsibly disclosed a high severity vulnerability to Sui’s bug bounty program through Immunefi. This vulnerability had the potential to cause a significant disruption by enabling an attacker to crash the validator nodes of the Sui blockchain, resulting in a temporary total network shutdown.

Fortunately, thanks to the whitehat’s swift discovery and report via Immunefi, the Sui team was able to quickly remediate the issue.

Thanks to the quick actions of both the whitehat and the protocol, the interrupted. Whitehat @F4lt was awarded a bounty of $50,000 in SUI for their discovery.

What is Sui?

Sui is a blockchain primarily focused on achieving instant transaction certainty, reducing delays in smart contract deployment, and enhancing overall transaction speed. To accomplish these objectives, Sui introduced Move, a native programming language based on the Rust crypto programming language for writing smart contracts. The blockchain runs on a delegated proof of stake (DPoS) system.

Sui validators play a role similar to miners in other blockchain ecosystems. However, Sui aims to stand out by enabling parallel processing of transactions. This parallel processing approach is expected to increase the network’s throughput, reduce latency, and enhance scalability by using Narwhal and Bullshark.

What is Narwhal & Bullshark?

Sui uses Narwhal as the mempool and Bullshark as the consensus engine by default to sequence transactions that require a total ordering, synchronize transactions between validators, and periodically checkpoint the network’s state.

Narwhal enables the parallel ordering of transactions into batches that are collected into concurrently proposed blocks, and Bullshark defines an algorithm for executing the DAG that these blocks form.

Validators on the Sui blockchain rely on the Byzantine Fault Tolerant (BFT) consensus where a quorum is a set of validators whose combined voting power is >2/3 of the total during a particular epoch.

The quorum size of >2/3 is chosen to ensure Byzantine fault tolerance (BFT) where a validator will commit a transaction (i.e., durably store the transaction and update its internal state with the effects of the transaction) only if it is accompanied by cryptographic signatures from a quorum. Sui calls the combination of the transaction and the quorum signatures on its bytes a certificate.

The policy of committing only certificates ensures Byzantine fault tolerance: if >2/3 of the validators faithfully follow the protocol, they are guaranteed to eventually agree on both the set of committed certificates and their effects.

A validator can handle two types of write requests: transactions and certificates. At a high level, a client:

  1. Communicates a transaction to a quorum of validators to collect the signatures required to form a certificate.
  2. Submits a certificate to a validator to commit state changes on that validator.

Let’s Understand How Validators Works

The Sui network is operated by a set of independent validators, each running its own instance of the Sui software on a separate machine (or a cluster of machines operated by the same entity).

Submitting a Sui transaction involves the following steps. These are transparent to the transaction sender, but it’s worth understanding what happens behind the scenes:

  1. Users send transactions to a quorum driver, such as a Full node, that broadcasts the transactions to a set of validators.
  2. Each Sui validator performs validity checks on transactions and adds a signature to valid ones. Each signature is weighted proportionally to the amount staked with the validator.
  3. The quorum driver collects signatures with a combined weight greater than or equal to 2/3 of the total stake (quorum of stake) into a certificate and broadcasts it to all Sui validators.
  4. When a validator receives a certificate, the validator verifies the certificate. If the certificate is valid, the validator then executes the embedded transaction and returns signed transaction effects to the quorum driver. A transaction becomes final after a quorum of validators receive and execute the corresponding certificate.
  5. Optionally, the quorum driver can collect an effects certificate based on the previous step and return it to the sender as proof of finality.

The client repeats this process with multiple validators until it has collected signatures on its transaction from a quorum, thereby forming a certificate.

The client can simultaneously multicast transactions/certificates to an arbitrary number of validators.

Once the client forms a certificate, it submits it to the validators, which will perform certificate validity checks (e.g., ensuring the signers are validators in the current epoch, and the signatures are cryptographically valid). Regardless of whether the transaction succeeds or is aborted, the validator will durably store the certificate indexed by the hash of its inner transaction.

Vulnerability Analysis

The whitehat reported a vulnerability that is associated with the logic of handling incoming certificates from the client in Narwhal pub get_certificates(…) function. This vulnerability arose from the absence of logic for both limiting the number of response digests to be queried and handling timeouts during the digest reading process. This allowed an attacker to force the Narwhal client to query an extensive number of digests within a single connection, leading to an Out Of Memory (OOM) exception on a validator node.

Source: https://github.com/MystenLabs/sui/blob/testnet/narwhal/primary/src/primary.rs

The current logic of the above functionality is to retrieve the digest data containing the certificate data. The logic extracts the digest field from the request object resulting in a GetCertificatesResponse which includes all the corresponding certificates, flattened inside a vector. This process is where the memory amplification exploit occurs.

The memory impact of these operations can be amplified significantly by the size of the returned certificates. The whitehat demonstration indicated that a valid certificate in an epoch could span up to 41KB when transcribed into a JSON string.

By injecting such large certificates into the Sui node, an attacker could, with just a single request featuring a 37MB payload containing 1,200,000 digests, trigger an Out Of Memory (OOM) exception on a validator node with 64GB RAM.

Proof of Concept (PoC):

The whitehat replicated the PoC by creating a batch of malicious certificates and sending it to a validator on a local testnet.

Setting up the validator node:

192.168.155.73
sui-node — config-path ~/.sui/sui_config/fullnode.yaml
192.168.155.74
sui-node — config-path ~/.sui/sui_config/validator-config-0.yaml
192.168.155.75
sui-node — config-path ~/.sui/sui_config/validator-config-1.yaml

The validator ended up querying the malicious certificate resulting in the following exception at the end:

data Err(Status { status: Unknown, headers: {}, message: Some(“unknown error: connection lost”), peer_id: None, source: Some(connection lost)

As it loads the certificate, the memory of the validator continues to grow until an OOM exception is thrown in the validator.

Vulnerability Fix

Sui fixed the vulnerability by removing the functionality of the GetCertificates and GetPayloadAvailability handlers from Narwhal in the following commit.

Acknowledgements

We would like to thank the whitehat @F4lt for doing an amazing job in responsibly disclosing such an important bug. Big props also to the Sui team who quickly responded to the report and patched the issue.

If you’re a developer or a whitehat considering a lucrative bug-hunting career in web3 — this message is for you. With 10–100x the rewards commonly found in web2, your efforts will pay off exponentially by switching to web3.

Check out the Web3 Security Library, and start earning rewards on Immunefi — the leading bug bounty platform for web3 with the world’s biggest payouts.