Amazon Web Services released a document detailing the operations of its Simple Storage (S3) service and in doing so revealed that the software powering the service is “being rolled out gradually within our current service.”
Title Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 [PDF], the document states that AWS is implementing “ShardStore” – a technology described as “a new key-value storage node implementation for the Amazon S3 cloud object storage service.”
The document also reveals that S3 currently holds “hundreds of petabytes of customer data.”
“At the heart of S3 are storage node servers that keep object data on hard drives,” the document explains. “These storage nodes are key value stores that contain fragments of object data, replicated by the control plane across multiple nodes for durability.
“Each storage node stores fragments of client objects, which are replicated across multiple nodes for durability, so storage nodes do not need to replicate their internally stored data,” he adds.
AWS also describes a concept called “crash consistency” that it uses to prevent data loss and achieve eleven new data durability, which means the service is designed to preserve 99.999999999% of the data.
Replicating data between nodes helps AWS achieve this reliability and means the loss of a node will not destroy the data.
“Recovering from a crash that loses all of a storage node’s data creates large amounts of network repair traffic and I / O load on the storage node fleet,” the document explains. “Failure consistency also ensures that the storage node returns to a safe state after failure and therefore does not exhibit unexpected behavior that may require manual operator intervention. “
ShardStore keeps track of all these objects. Its keys are fragment identifiers and the values are client object data fragments. The importance of ShardStore data means that it is evenly spread across different nodes and disks.
Nothing in the document indicates whether or not users will perceive a change when implementing ShardStore, but does mention that it is “API compatible with our existing storage node software, and therefore requests may be processed by ShardStore or our key value stores “. The Reg I can’t imagine the switch to ShardStore would confuse users – a downtime requirement would make AWS laugh out of the cloud.
The article describes how AWS used light formal methods – a technique for using automation to verify that software meets its specifications – to ensure that ShardStore is doing its job. So most of the word count is spent explaining how AWS tested the more than 40,000 Rust rows that make up ShardStore, and the many acts of deep storage the software does to keep S3 alive.
In conclusion, the authors report that AWS’s experience with Lean Formal Methods has been “positive, with a number of issues prevented from reaching production and substantial adoption by the ShardStore engineering team.”
The authors included several AWS staff as well as people from the University of Texas at Austin, the University of Washington, and the Swiss Public Research University ETH Zurich. ®