How to handle eventual consistency with S3

Update, 12/7/21: Amazon has apparently enabled strong consistency for S3 by default, meaning this article’s description of S3 as eventually consistent is no longer valid and the techniques described for handling it should no longer be necessary. However, feel free to read on if you want a better understanding of eventually consistent systems in general or if you want a better appreciation of what Amazon making S3 strongly consistent really means.

Background

Eventual consistency is one of those computer science words that I didn’t learn in college. But my basic understanding of it is that it describes when data writes are fully consistent throughout a distributed system.
Think of a system where the database storage is replicated across multiple nodes to optimize performance (a common scenario). When you do a write to that database system, that write needs to propagate to all of the nodes before the system is consistent (assuming all nodes maintain a full copy of all of the data). In the time before that write has fully propagated to all of the nodes, a read request that is not routed deterministically to any particular node could return multiple states of the data: data does not exist (if write is creating the record), a previous version of the record, or the most updated version of the record. This is what we mean by eventual consistency: eventually the data across all nodes/the system is consistent, but there is some period of time where the data is not consistent.

Consequences of S3’s eventual consistency

S3 is an eventually consistent system. In order to give you that great S3 availability, S3 internally spreads the storage of your data objects across a bunch of different nodes. And since read operations are not deterministic, your S3 response could not yet reflect any changes that you just made creating/updating an object. The AWS S3 documentation calls out a few potential consequences of their consistency model:

Updates to a single key are atomic. For example, if you PUT to an existing key, a subsequent read might return the old data or the updated data, but it never returns corrupted or partial data.

Amazon S3 achieves high availability by replicating data across multiple servers within AWS data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:

— A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
— A process replaces an existing object and immediately tries to read it. Until the change is fully propagated, Amazon S3 might return the previous data.
— A process deletes an existing object and immediately tries to read it. Until the deletion is fully propagated, Amazon S3 might return the deleted data.
— A process deletes an existing object and immediately lists keys within its bucket. Until the deletion is fully propagated, Amazon S3 might list the deleted object.

We have seen these consequences crop up all over the place on Cumulus.

Handling S3’s eventual consistency

So the question is: what can we do about it? How can we avoid failures related to getting an S3 object response that we were not expecting? And the answer is: ETags.

An ETag, or entity tag, is an arbitrary identifier (usually a string/hash) representing the state of the requested resource. They have multiple uses, but for our situation the most important is the ability to make conditional requests. For HTTP requests, if you know the ETag for a resource, you can make a request to a server with an `If-Match` header which basically says “only return a successful response if the ETag matches what is specified in the `If-Match` header”.

As it turns out, the S3 API does return ETags for objects and supports an `If-Match` parameter for GET requests. PUT requests will also return an ETag.

Important note: the ETags returned by S3 are usually an md5 checksum of the data, but not always. So don’t rely on the ETag being an md5 checksum. See the official documentation.

Putting it all together, we can use the ETag from an S3 PUT response as the `If-Match` parameter for an S3 GET request. Importantly, doing this will not guarantee that the S3 GET response will successfully return the expected data, it will only guarantee that if the response is successful, it will contain the expected data, otherwise return a 412 error response. So we’re still only halfway there because even using the ETags doesn’t prevent inconsistent responses, it just treats inconsistent responses as errors.

The last piece of the puzzle is retry logic. Using your language/SDK of choice, you can implement a function to request objects from S3 with a specified ETag and add special handling to retry the request if the 412 response is returned by S3. In essence, this function will wait on S3 to become consistent and then return your expected response. Here is an example of such a function from Cumulus (in Typescript).

You should now have everything you need to handle bugs related to S3 eventual consistency. It's important to note that S3’s eventual consistency itself is not the bug, it’s how we factor that consistency model into our code that matters.

For what it’s worth, DynamoDB is also eventually consistent. But for DynamoDB GetItem requests, you can opt-in to “strongly consistent” reads (meaning they should always return the accurate state based on any previous writes) via a request parameter.

Trying to solve more problems than I create

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store