Deep Couch: Deterministic Revs for Idempotent PUTs

2009/09/22 23:02:21 +0000

Idempotent

HTTP traffic (really, many kinds of traffic, as this happens at the TCP level) is sometimes repeated by intermediaries with no knowledge of either the client or the server. This is why it is important to use idempotent verbs when possible, like PUT or GET, instead of POST.

When non-idempotent commands are repeated by intermediaries, the result can be unintended (and silent) side-effects. Sometimes form POSTs are repeated, leading to duplicate data stored in application state. The same is true for any kind of HTTP request, no matter the verb.

However, with an idempotent request, repeats due to intermediaries will not induce additional side effects. For instance, running GET twice is no big deal.

PUT is idempotent, so PUTing the same document to CouchDB twice (or more times), should not be any different from putting it once. And we see this is true. (at the end of the post where it won't bore you)

Because the _rev is changed by the first successful PUT, the additional PUTs have no effect. You can PUT the same document to CouchDB 1000 times and only cause one update on the server. All subsequent requests will fail due to the MVCC mismatch.

Lesson learned: use PUT whenever you care about not accidentally creating a bunch of duplicate data. For the reason CouchDB includes the _uuids API. Use it.

Now here's the fun part:

Maybe it's time to scale CouchDB up, or you want to provide high-availability and redundancy. So you get replication happening between 4 nodes, and you throw a load-balancer in front of them. To outside users, things look like a single Couch, as long as they are willing to live with eventual consistency. You set up the cluster well, so replication lag is less than 500 ms, and applications "just work". Good job, you've built a Big Couch.

Deep Science

OK it's not that deep. Remember what we said about PUT being idempotent? Well what happens if a PUT is repeated, but the load balancer spreads the repeats across all of your nodes? Of course, the write succeeds once on all 4 nodes. Great! You know it's stored securely. Yay idempotent!

Here's the code that makes it work.

new_revid(#doc{body=Body,revs={OldStart,OldRevs},
        atts=Atts,deleted=Deleted}) ->
    case [{N, T, M} || #att{name=N,type=T,md5=M} <- Atts, M /= <<>>] of
    Atts2 when length(Atts) /= length(Atts2) ->
        % We must have old style non-md5 attachments
        ?l2b(integer_to_list(couch_util:rand32()));
    Atts2 ->
        OldRev = case OldRevs of [] -> 0; [OldRev0|_] -> OldRev0 end,
        erlang:md5(term_to_binary([Deleted, OldStart, OldRev, Body, Atts2]))
    end.

In the old days (CouchDB 0.9.1) _revs were generated randomly on write. See couch_util:rand32() for the source of randomness. The first branch of the case statement is where this happens, to maintain backwards compatibility. This original implementation was a stub, as it doesn't make sense to implement this first, if it mostly works without it.

The random revids acted properly as MVCC tokens, but didn't preserve idempotence when run in a cluster. With random rev ids, the first PUT would succeed on each node (just like we want) but when those nodes replicate, the new revs generated there would all be (randomly) different. To CouchDB this looks like a conflict.

These spurious conflicts were fixed with CouchDB 0.10, which as you can see determines the _rev with erlang:md5(term_to_binary([Deleted, OldStart, OldRev, Body, Atts2])) - this means that in the case of repeating intermediaries, you'll always get the same revs on the cluster nodes. When replicated, the nodes know this change is already present, we don't get a conflict.

It's not so complicated, but you have to focus to make it to the end. You did it!

Example repeated PUT behavior

Create a database:

$  curl -X PUT http://localhost:5984/idempdb
{"ok":true}

Create a document with id = "foo":

$  curl -X PUT http://localhost:5984/idempdb/foo -d '{"_id":"foo"}'
{"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}

Update the document to ddd a field baz = "bam":

$  curl -X PUT http://localhost:5984/idempdb/foo -d '{"_id":"foo", "baz":"bam","_rev":"1-967a00dff5e02add41819138abb3284d"}'
{"ok":true,"id":"foo","rev":"2-384ea69ce1a73aa7d21d23b608d221d0"}

Repeat the update.

$  curl -X PUT http://localhost:5984/idempdb/foo -d '{"_id":"foo", "baz":"bam","_rev":"1-967a00dff5e02add41819138abb3284d"}'
{"error":"conflict","reason":"Document update conflict."}

No dice. "error":"conflict"

Comment on this post