Standalone Applications with CouchDB

2008/10/31 16:47:45 +0000

Update: The topics of this post are covered in much greater detail by the O'Reilly book - CouchDB: The Definitive Guide available for free online. Get inspired here, but go there to get started.

Over the last few days I've polished up my notion that CouchDB can be a perfectly viable application host, all on its own, without any 3rd tier between database and client. That is, CouchDB is capable of serving standalone applications. These standalone CouchDB applications can be deployed to any working CouchDB node and used from any browser.

Encapsulating business logic in a Ruby or Python middleware framework like Merb or Django is no longer necessary. Between CouchDB's views, its security model and validations, and the new _external server, developers will have everything they need to present fully functional, responsive, and dare I say scalable, web applications.

Full disclosure: I'm an Apache CouchDB committer, but these views are not a result of that work, rather they are the reason I was driven to join the community in the first place.

Of course, in this dream world of logarithmically scalable JSON and Ajax web applications, all users have to bring is a browser. However, those users who run local CouchDB nodes can stay connected to the mothership via replication, empowering not only offline use, but an explosion in open source web applications. Standalone CouchDB applications travel with the data they manage, and data can be replicated between peer nodes. Users will be free to modify and share the tools they use to manage their data, as well as the data itself.

Thinking of peer-based application replication takes me back to Austin's science magnet, LBJ High School. As freshman, my friends and I would share little video games between the TI-85 graphing calculators we were required to carry. Two calculators could connect by one of those smaller-than-headphone minijack cables, and we'd share Physics cheat sheets, Hangman, some text-based adventures, and at the height of our powers, I believe there may have been a Doom clone running.

The TI-85 programs were in Basic, so everyone was always hacking each others hacks. Perhaps the most ridiculous program was a version of Spy Hunter that you controlled with your mind. The idea was if you could influence the pseudo random number generator by concentrating hard enough, you'd be able to control the car. Didn't work. Anyway, the point is that when you give the kids control of the source code, there's no telling what will happen.

CouchDB's replication facility has the potential to bring an explosion in innovation, as kids realize that they can hack and share the apps they use everyday. And duh, if you wanna be with it, you gotta keep up with what the kids are doing. ;) But seriously, the freedom we'll see when people control their data, and the applications they use to access it is what keeps me up at night working on this.

To participate in this Cambrian explosion you'll be required to write your application by new rules. At first blush, pure CouchDB apps follow more restrictive constraints than even Google App Engine. However, I'll argue that pure CouchDB apps can offer more room for innovation. This is because with a CouchDB app, you control the whole stack from root to fruit, and can deploy your applications on your own terms, from small scale in-house use to datacenter-level distributed systems.

CouchDB already offers a built in HTTP server with attachment support and file uploads, as well as a rich set of APIs for storing and retrieving JSON records. By taking advantage of these characteristics pure-CouchDB applications can reach near parity with traditional server-heavy approaches. Notable exceptions include the ability to render dynamic text that isn't JSON, or return results that require multiple queries against CouchDB.

With the recent addition of _external support to CouchDB, now it is possible to craft dynamic XML responses (or anything else you can do from a JavaScript function) directly from CouchDB's Spidermonkey script runner. In this article I'll show some code that renders a CouchDB view as an Atom feed. Everything's still very rough, but I'm encouraging people to hack on it and share. There's a framework (or maybe a whole new way of working) emerging, and I want you to be a part of it.

What does a Pure CouchDB app look like?

I assume you already know about CouchDB's database capabilities. To refresh: it's a RESTy document store with incremental map reduce queries. The documents are JSON and the map reduce queries are written in JavaScript, or your language of choice, via a stdio socket that speaks a simple JSON protocol.

Among CouchDB's various documents, are those who's _id starts with "_design/". These documents contain the map reduce views, and according to Damien Katz, they should correspond 1:1 with the applications that run across a dataset. One application equals one design document.

The example application in this post (and the script I've written to bootstrap it into CouchDB from the filesystem) is designed around this principle of one design document per application. I'm especially keen on the innovation that we'll see when people start to write distinct and competing applications, that run across the same dataset.

CouchDB documents can hold attachments, which are served up as the mime/type they were stored with. This makes it easy to store and serve HTML, CSS, Javascript, and even image files directly from CouchDB. By attaching a few HTML files and some JavaScript behavior to the design document, you can have a working blog, as I showed in the demo portion of my Arc90 talk.

As a proof of concept, look no further than CouchDB's built-in administrative interface. Futon, as it's called, is just a collection of HTML, CSS, and Javascript files, and it is a fully functional database browser and JSON document editor, as well as the runner for CouchDB's functional test suite. The test suite is written in Javascript and runs from the browser. What else could more strongly indicate that Ajax and CouchDB go hand in hand?

JavaScript, static HTML, and CouchDB doesn't buy you Atom or RSS feeds, however. And who'd seriously create blogging software without feeds? The _external server system to the rescue! Basically, CouchDB parses and forwards HTTP requests as JSON object to an external script. The external script can then process the JSON, make additional queries, and return whatever response it chooses, including redirects, HTTP status codes, plain HTML, JSON, or other content types.

IMHO, building raw Ajax + JSON apps is the fastest, simplest way to get a new idea in front of other people. I've done it a few times in various contexts. Grabb.it is just some Ajax on top of a JSON API (hosted on Rails and PostgreSQL). Grabb.it had to be Ajax-heavy, because changing pages would make the music stop. TracksPress (Grabb.it's new skunkworks product) has a pay area, which is just Ajax on top of a Merb/CouchDB app. For me, pure Ajax applications have a winning track record.

An Ajax app is simple: just get a shell of an HTML page up, throw on a little CSS and some jQuery, and you're up and running. Back that with the power of CouchDB and suddenly you're within striking distance of a real web app. Add CouchDB's `_external` handler and you can do anything another service would do, from rendering rss feeds to dropping an upload into a processing queue.

Still on the Edge (installing)

Installing couchdb-example-blog isn't nearly as easy as it should be. First of all couchapp, the installer script, requires Ruby, the CouchRest gem, and all of it's dependencies. I'm considering porting couchapp to a leaner environment for convenience.

Secondly, couchdb-example-blog requires a non-standard branch of CouchDB itself. This should be simplified in the future. The CouchDB team still needs to decide on a build process for plugins. The _external process manager I've been helping davisp with, which powers action servers, as well as the Full Text Search interface, may make a model plugin, although it may end up included by default.

For detailed installation directions, see the couchdb-example-blog README and it will step you through everything.

No Joke (Security Model)

CouchDB will have a comprehensive security and data validation module. The work may have already begun. It looks like Damien is targeting this as a 1.0 feature. The upshot is that the security model is not yet available, but there is a description of it. (see Security and Validation)

Basically, any changes, such as document POST, PUT and DELETE, are validated by a JavaScript function, which is supplied the currently saved version of the document, the proposed version of the document, and some information about the request (including the user or other authentication parameters.) The function may return false to indicate an invalid operation, or true to let the operation proceed. There's talk of extending the specification to allow validation functions to set values on the to-be-saved document, like timestamps or user information. With such this security model, it is possible to implement author-only saves, data format checks, and probably most of what application programmers require.

E4X: ECMAScript for XML

Spidermonkey includes the E4X extensions, which give it the ability to manipulate XML with a native API. You get to use JavaScript "dot notation" to mess with XML. Most articles and tutorials out there are tailored towards parsing XML. That may be important to some folks (if so check out the links in the previous sentence.)

Instead I'll show how I generated an Atom feed using Spidermonkey's JavaScript interpreter. It's deceptively simple. I'm sure there are better ways to do this with E4X, but hey, this one's not that bad.

First we load the design document from CouchDB, and use it's blog attribute to fill out some template values in the XML feed header.


var blog = db.open('_design/couchdb-example-blog').blog;
var feed = <feed xmlns="http://www.w3.org/2005/Atom">
<title>{blog.title}</title>
<link href={blog.url}/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
<name>{blog.author.name}</name>
</author>
<id>{blog.url}</id>
</feed>

If we were to call `feed.toXMLString()` now we'd get just what it seems like we'd get, with everything all escaped and quoted correctly. We still need to add the entries to the feed, so we'll fetch a view and iterate over its posts, copying them to the XML. Check out how easy the dot notation makes this. (Note that it looks like attribute access, but really it's a big nasty XML operation, so don't do more than you need.)


var view = db.view('couchdb-example-blog/recent',{count:10,descending:true});
for (r in view.rows) {
var row = view.rows[r];
var post = row.value;
var entry = <entry/>;
entry.id = blog.url '/' row.id;
entry.title = post.title;
entry.updated = post.date;
entry.content = post.body;
feed.entry += entry;
}
return {body: feed.toXMLString()};

Now our example blog has a feed, so it's a real blog. Of course it's barely a real blog, but at this point features like Gravatar integration and comment moderation are just window dressing.

Data Portability

Why does it matter that CouchDB can serve standalone apps? Why should developers go through the trouble to learn a new paradigm?

I think any platform that gets out of the way enough to let you build Ajax apps with only the bare minimum server logic is a win in terms of developer productivity. But if you are hesitant on that front, consider the potential for portability.

When all developers need to do, to run their own instance of an application, is install CouchDB, suddenly we're talking about a platform even more simple than PHP plus MySQL, which has become the gold standard for ease of installation. Imagine if the process for installing a new CMS, or a new ticket tracker, or any other browser based application was a simple as replicating a _design document from the application's root repository.

What's more, pure-CouchDB apps don't care if they are hosted on a local laptop, on a school or company intranet, or open on the web for all to use. The same code base can work in all those contexts, and the same set of data can be replicated across them. One of CouchDB's sweet-spots is offline replication. This means you could host an application at a school, allow students to replicate it to their laptops, work on it at home, and turn their work in by replicating back when they return to school. Data can be distributed without complex access controls, as would be required if the students accessed the school's network from the public internet.

I look forward to seeing some of the prototypical Web 2.0 use cases migrate to this user-controlled future. What will photo-sharing look like when you can send your photo album, along with its user-modifiable browsing application, directly to friends? The web has shown that the power to view source is a catalyst for learning and innovation. With CouchDB apps, view-source exposes the whole application. I'd really like to see some public-data projects like Public.Resource.Org and The Sunlight Foundation experiment with user-owned applicatios.

Low-Hanging Fruit

Here are just a few ideas for standalone CouchDB application that I've come up with. I'd love to see any of them shared around, preferable under a free-license.

Feed Reader: CouchDB should be able to provide a user experience similar to Google's Reader. Using their API you could probably even interact with your friend's shared items like you're used to. A nice point about action server modules, is that they can be easily generalized - eg a function that's used by one app to fetch feeds can be reused in another to power an image downloader or even a web spider.

Ticket Tracker: Imagine Trac as a distributed Ajax application. Now remove all the features you never use. Working offline has proven productive in the mass exodus to git and other dvcs. Managing project information with offline clients may prove to have some of the same efficiencies.

Wikis: The appeal of a CouchDB wiki is that it'd be easy to keep relatively private. A prototypical use case is in the classroom: a group wiki which students are able to edit at home even without network access.

The Future

This essay has been largely speculative. However, it is backed up by running code. I consider it a call to arms. If you're inspired by CouchDB, or by the potential freedom that we'll win by open-sourcing all the web apps, I encourage you to start working now.

We could really use someone to package CouchDB as a browser plugin. Our license is compatible with the Gears license -- all we need is someone to tackle the build.

Also, write applications, share applications, and fork them. I hope to be running this blog from CouchDB as soon as the security features have been implemented. What else is ready for prime time?

Comment on this post