Simple Wins

2009/05/25 06:10:40 +0000

The biggest response I got to Toast, my realtime CouchDB chat server was: "wtf why didn't you use XYZ technology?"

The point of developing chat in CouchDB is not to show how CouchDB is an ideal persisted chat server (even if it is). The point is to show how CouchDB's "databasey" features, because they are implemented using HTTP, can be leveraged to make powerful end-user experiences, with just a minimum of code.

Before I dig into how Toast works, let's talk about simplicity. Detractors to the experiment tend to fall into three camps: First are the script kiddies who are so gleeful that I didn't harden my HTML escape functions that all they can think to say is:

alert("fart!");

I was utterly surprised by how many of them there are. I guess it's a good thing that CouchDB got a lot of interest from the 15 and under set, but damn - kids smell.

The second set of detractors say realtime applications shouldn't go to disk, because it's slow and a lot of data. To them I say "disk is free". Also, I'm not trying to build a stock exchange or a missile guidance system. Call me when performance matters. CouchDB is designed for highly concurrent applications, and while Toast's traffic doesn't count as super high, the upshot is that my old Mac Mini was able to sustain 4 hours at the top slot on Hacker News without load exceeding 0.3.

No technology can scale to Twitter like sizes without application specific problems, and building a large-scale realtime messaging system will always require a lot of tuning and encountering strange issues. I would definitely recommend anyone who's building a system like that to have a look at CouchDB.

The third set of detractors seem to think speed is everything. Um, huh? What matters is fast enough. As long as latency is acceptable, the only reason speed matters is if you are trying to cram a bunch of activity through a limited number of processes. If you have 10,000 users all updating once a second, you either need a k/v store that can handle 10k updates per second, or you need 10 key value stores that can each handle 1000 updates per second. With CouchDB the second thing is a lot easier to build than it is with most of the other options.

But what none of these detractors seem to understand (except for the script-kiddies) is that the important thing about this demo is the web. CouchDB is part of the web, and serves web applications natively.

Serving RESTful web applications natively means two things.

  • you already know how to use CouchDB.

  • your applications can be deployed anywhere there's a CouchDB. This makes transporting an application (and it's data) from one node to another trivially simple. This is why I call it the p2p web.

Back to Simplicity

Of all the qualities a good program can have, simplicity is most often overlooked. When you see benchmarks ("look how many operations per second my hash table can handle") almost never do they brag about how easy it was to achieve. Even if you wanted to brag about simplicity, it's hard to find a quantitative measure. Certainly the Quine test isn't about true simplicity:

#!/usr/bin/perl
$_=<<'eof';eval $_;
print "#!/usr/bin/perl\n\$_=<<'eof';eval \$_;\n${_}eof\n"
eof

So if we can't measure simplicity, how can we value it? I'm not alone when I say the web stack (HTML and friends) is simpler than the predecessors in computing history. This is completely subjective, of course. No one would argue that modern web browsers are less complex than for instance Prodigy's old school terminal but I will argue that it's a lot easier to write robust applications for the web.

The kind of simplicity I'm talking about is what you see when you view-source on a basic HTML page. There's not much learning curve, and a beginner can probably get somewhere useful just by reorganizing the content they see in their editor.

The same reason explains why PHP and the other scripting languages beat out Java on the server. Maybe Java is more "correct", but PHP's dirtiness means you don't have to be an expert to build something using it.

CouchDB is simple in that same way, in that it hides much of the complexity from you as a developer. Actually, there isn't much complexity to hide as CouchDB is simple on the inside as well.

Making Toast

There have always been a few killer features on the CouchDB roadmap. They all play directly into CouchDB's dual nature: scaling up to thousands of nodes, as well as scaling down: CouchDB's paradigm use case is local deployments. Users serve application to their peers in human sized groups. CouchDB is built to give you control of your data - in your pocket, on your laptop, and in the cloud.

One of these features has always been filtered replication. A top use case is for splitting shards in a partitioned cluster. The first step toward filtered replication is a realtime stream of updates, made available as events in the database sequence. HTTP access to the update stream of database changes is crucial for chat on CouchDB.

Oh, messages in sequence... That's another new one that Eric aka @thisfred was pretty happy about. It gives a lot of flexibility, but should only be used if you know what you are doing. Essentially you can order view results in your database by order of the local sequence number (when the document was written to the server generating the views.) However, in replication this order is not preserved, so tracking an ordered stream across multiple disconnected nodes has it's own challenges.

Toast took about 6 hours of coding, interrupted by a few hours of adding features to CouchDB to support it.

Normal Toast

Normal Toast

The simplest thing about Toast is that it can run on any copy of CouchDB (0.10.0-dev or newer). That's what makes it a CouchApp. "What does it matter?" is a common question when people are confronted with their first self-contained CouchApp. The difference is that a pure CouchDB application is as portable as the data it manages.

View Source

When the application is just data, and moves through the same replication flow as other data, it gives users control over the source code and not just their data. Some users won't notice, but those who do will start hacking on the apps they use. Just as an Excel user quickly turns into a Excel hacker, we'll see CouchApp users becoming savvy about editing JavaScript views, Ajax callbacks, etc.

To see the Toast source code in deployment (yes this is the actual code that runs Toast) click this link to the Toast design document in Futon. Clicking index.html will take you into the application.

Deploy via HTTP

Design documents are the source code of CouchDB Applications. They contain definitions of views as well as HTML pages that turn into Ajax applications. There's room in the CouchDB application model to have applications written in PHP, Ruby, Python or any number of languages, as long as they are properly sandboxed.

The cool thing about deploying applications as regular documents is that I can replicate from my local machine to a cluster to deploy. Or I could use HTTP to PUT the application to a remote machine. Anonymous users can not create design documents, but admins can.

Share through replication

Replication is the bomb.

Any two CouchDB databases can be merged using replication. All new updates are applied to the target database, whether they are the addition of a new document, updating an existing document, or even the deletion of an existing document. Replication is incremental, which means extra data is not transferred if the databases have replicated recently.

When I wanted to merge Jason's channel onto my localhost, I could replicate his his CouchDB database to mine via HTTP. This way any messages he'd seen would be available for me to browse locally. By connecting a mesh of CouchDB's you'd be able to keep up with a few channels without much latency.

Futon Replication

Above is a screenshot of the what replication looks like in CouchDB. You merely enter the source and target database urls, and CouchDB handles the rest. Here's a zoom in on what it would look like for your when you're using it.

Replication Controls

Low Energy State

So duh, the web is simpler than Prodigy. What else is simpler? Basic is simpler than C. Ruby is simpler than Java. The web is simpler because more of the decisions are made for you.

How do realtime updates work on CouchDB?

$ curl 'http://jchrisa.net/toast/_changes?continuous=true&since=9457'

a recent sequence number will always be last. Changing your request parameters to point to the current bottom sequence id will give output like this, after a few messages are sent in the room:

$ curl 'http://jchrisa.net/toast/_changes?continuous=true&since=9507'
{"results":[
    {"seq":9508,"id":"fca5e615dec80dbdd848a0a3738d5be4","changes":[{"rev":"1-1761360865"}]},
    {"seq":9509,"id":"224b8f9ce4ef5587d0c6676b87d04aa6","changes":[{"rev":"1-3621019413"}]},

where each line is a JSON object. The cool thing is that CouchDB (and your browser) will hold open the connection for minutes if that's how much time passes between updates.

In the case of Toast I don't even parse the JSON text, but rather rely on the browser to let me know just that it has changed, but hooking to the onreadystate change event. Here's the relevant code, from the bottom of channel.html:

c_xhr = jQuery.ajaxSettings.xhr();
c_xhr.open("GET", app.db.uri+"_changes?continuous=true&since="+db_info.update_seq, true);
c_xhr.send("");
c_xhr.onreadystatechange = function() {
  refreshView();
};

This calls the refreshView() function anytime a new line comes over the wire from CouchDB. So we don't even care what the lines say. Simple.

refreshView() just makes sure the user has the 25 most recent messages on their screen. It's not fancy, but it is simple. Also, this code could be optimized in a straightforward manner to use much less resources.

Toast is a showcase of how simple a real-time chat system can be, when you leverage CouchDB's _changes API.

There will be other exciting apps coming.

Comment on this post