Benchmarking CouchDB

2009/10/16 21:46:01 +0000

It's been too long since I've sat down to benchmark CouchDB. I'm working on the High Performance CouchDB chapter in the book, so I needed some numbers.

I've committed the scripts used in this blog post to the CouchDB svn repository.

We also know that concurrency is very important in real world load. The new benchmark scripts don't run in parallel yet. One step at a time.

Anyway, a very interested data point. I've got a JavaScript program that will run functions as many times as it can in 10 seconds. The functions I'm running are things like "create a CouchDB document with a random id".

Here's a pretty normal benchmark run.

CouchDB Benchmarks
Host: 127.0.0.1:5984
Version: 0.11.0be3610fa4-git
Bench Duration: 10000ms

Basic Numbers:

We can write about 258 docs / second with a single writer in serial. (Pretty much worst-case scenario writer.)

single_doc_insert
2584 docs
257.9357157117189 docs/sec

This doubles with batch=ok:

batch_ok_doc_insert
4784 docs
478.4 docs/sec

Bulk docs is of course way faster:

bulk_doc_100
14600 docs
1454.6179137192387 docs/sec

For references, all the docs looked like this: {"foo":"bar"}

bulk_doc_1000
22000 docs
2146.3414634146343 docs/sec

Looks like about the best we got via this crazy JavaScript benchmark suite is 2.5k docs/sec.

bulk_doc_5000
30000 docs
2551.0204081632655 docs/sec

bulk_doc_10000
30000 docs
2342.2860712054967 docs/sec

Full Commit

Full commit is where CouchDB actually writes data to the actual disk before it responds to the client. (Except in batch=ok mode). It's a simpler code path, so it has less overhead when running at high throughput levels. However, for individual clients, it can seem slow. Let's dig in:

I've varied across 1 dimension - "X-Couch-Full-Commit", with 10 second runs. (Loops counts # of times the function was run, functions accumulate a count of documents created.) It'd be nice to have setTimeout() support in couchjs but for now I'm just iterating while the allotted time hasn't passed. The empty function can run something like 600k in a second so I think the test harness is mostly ok.

CouchDB Benchmarks
Host: 127.0.0.1:5984
Version: 0.11.0be3610fa4-git
Bench Duration: 10000ms

Look how slow single_doc_insert is with Full Commit enabled. 4 or 5 docs / sec -- wowsers! That's 100% a result of the fact that OSX has a real fsync so be thankful! The story gets better as we move into bulk operations.

single_doc_insert
46 docs
4.583042741855135 docs/sec

With batch=ok things are much better as users see low latencies. This action returns HTTP response code 202 Accepted before docs are actually written to disk. This is perfectly safe as long as you can accept the last second or so of writes to be volatile. Clients can explicitly flush as well, when they need fsync guarantees.

batch_ok_doc_insert
4851 docs
485.00299940011996 docs/sec

Here is Bulk Document Inserts at four different granularities, from array of 100 docs, up through 1000, 5000, and 10k.

bulk_doc_100
4400 docs
437.37574552683895 docs/sec

bulk_doc_1000
17000 docs
1635.4016354016355 docs/sec

bulk_doc_5000
30000 docs
2508.1514923501377 docs/sec

bulk_doc_10000
30000 docs
2699.541078016737 docs/sec

We notice that the throughput continues to go up even as the bulk size reaches 10k. We're getting better times here than with ensure_full_commit off, which lets us know that tuning for your application will always bring better results than following a cookbook.

Horse Races

OK, 2700 docs / second is fine, but we want more power! Next up we'll explore running bulk docs in parallel.

With a different script (bash and curl) I'm inserting large batches of documents in parallel to CouchDB. With batches of 1000 docs, ten at any given time, averaged over 10 rounds, I see about 3,650 docs / second on a MacBook Pro.

Comment on this post