What is a blog?
Quick, what does it take to run a blog? This is not meant to be a rhetorical question: I mean strictly from a tools perspective; “having something to say” does not count.
Definitions of what constitutes a “tool” vary, but you are permitted to be as abstract as you like.
What did you come up with?
You may have come up with something like a platform: wordpress or blogger or something similar (does anyone still use livejournal?).
Or perhaps you came up with a set of features: commenting, for example. Or maybe a content delivery system, or some kind of social feature. A syndication feed, perhaps?
Maybe your whole idea of such a thing revolves around using a social network such as Google+, twitter or Facebook as a publishing platform. Perhaps you want to participate in a sharing community such as tumblr.
Or maybe it is all about presentation. The software must function in such a way that the result looks like a blog. That is, content presented as chronologically (either forward or reverse) ordered posts that are interlinked and presented with some consistent layout: Each post has a header, a byline and date, content organized into sections, with comments following at the bottom. There are sidebars linking to other posts, external documents, etc.
These are all reasonable answers. But are any of these things strictly, formally, necessary?
In a word: no.
So what is? My requirements list is pretty short:
- A text editor.
- An HTTP server and a place to run it.
And that's it.
A blog is a “blog” by convention only. From a software perspective, it is just a “normal” web site. Oh sure, lots of software understands the convention and behaves accordingly. And of course one can build in increasing levels of software sophistication for the convenience of the author (e.g. a browser interface for posting new material, or some kind of convenient, simplified markup that is dynamically translated to HTML when requested), or to add useful functionality for the reader (such as commenting, search, etc), but none of that is truly necessary. All you really need is a text editor to create the content and some way to serve it to the world.
The Bare Minimum
My blog is new. And as you can probably tell (I am writing this on ), it is primitive.
That's okay for right now as it is serving the purpose I created it for in the first place: I wanted to get a better feeling for the core technologies that power the web. So everything I am doing at the moment is at an extraordinarily low level: I edit the posts by hand, using a text editor, and mark them up directly in XHTML5. The styling is done using CSS that I type into a text editor. The only two pieces of content that I did not type out by hand so far are my picture on the about page (I obviously took that with a camera. Okay, okay, fine: amend my requirements list to include a camera if you are into taking and posting pictures and such things), and the “favicon” that looks like the partial symbol from mathematics (I edited that on facvicon.cc).
The HTTP server I use is custom; I started out with a hacked version of Tom Duff's Duffgram HTTP server, but replaced it with one I wrote myself. The best way to get to know a protocol may be to implement it, and the second best to read and absorb an implementation.
Now, this is certainly painful: it is a lot of work
to do everything “by hand.” For example, maintaining
the various links in the navigation area across
multiple entries is both tedious and error prone.
Similarly, ensuring that links to stylesheets and so
on in the
<head> element on each
page are consistent, making sure that content remains
properly linked if a resource moves, and other such
janitorial chores require lots of manual effort. And
certainly writing one's own HTTP server is completely
unnecessary. It is easy to see why a user would
prefer a “canned” solution such as
blogger: most of
the annoying and boring drudge work is already done
And hosting things oneself is inherently problematic: what if your server crashes? What if most of the people who are interested in your content are geographically far away from the server? Service providers replicate your data in multiple geographic locations to ensure both uptime and low latency for physically dispersed user populations. It is a lot of work (and it is not cheap) to do this yourself.
So why bother doing everything from scratch?
Because it is a learning experience.
So often these days, we do things without really understanding what we are doing, or why we do those things the way we do (see my earlier post on The Tyranny of the Hollerith Punched Card for exploration of a similar theme). But if we want to understand something, then we really need get to its core; the fundamental basics that underly that which we seek to understand.
I have been using the web for a very long time now; almost as long as it has been around. My first exposure to a web browser was circa spring , seeing NCSA Mosaic demonstrated on an IBM RS/6000 model 320 under AIX 3.2.5. We quickly got it running on Sun SPARCstations under SunOS 4. The graphics were cool and while the web seemed nifty at first, the fascination quickly wore off: HTTP was overly simplistic, lacked any notion of a “session” and was fundamentally batch oriented: this made it hard to build interactive applications. Further, the browser could not do much other than, well, browse: even finding text within a page was limited (regular expressions, anyone?); forget about manipulating it in any meaningful way. The overall impression was that users interacted with the web through a glorified 3270 terminal. And if you downloaded the page source to work with the content using “normal” tools, you were quickly thwarted by a sea of angle brackets: HTML as a markup language was a joke.
Of course, things have changed in the last twenty years. Despite its obvious technical deficiencies, it was in the right place at the right time, and the web caught on and grew to become the dominate application of the global Internet. HTTP has matured, as as HTML.
However, first impressions are the hardest to change, and my initial impressions of the web as an underpowered toy defined by tag soup and bad protocols have been slow to die. I still think that the web is overly complicated for what it does: HTTP is not what I would consider an example of a “good” application protocol, and HTML continues to have conformance and portability problems across browsers. Despite recent advances, the entire browser-centric model of interaction with the machine feels constricted and limiting: it traps you in its own world. The web got rushed to “production” too soon and its warts were standardized: we have been living with the consequences since.
But, well, it has been 20 years. And let's face it: I make my living because of the World Wide Web. So I decided that it was time to get back to basics and try to understand how this system really works. Not twenty years ago, on 40MHz workstations with a few megabytes of RAM, but here and now. It was time to get back to the bare necessities: content and a way to distribute it.
Why the server side? Well, building a browser is too much work (and not something I find particularly interesting, anyway). Working on the server side is much more tractable. What would it look like if I started entirely from scratch? Just a text editor and figuring out how to serve the content?
What would it mean if I decided that I did not need to use an existing platform? What if I decided that I did not need to support commenting right away? What if I did not need to use a pre-existing web server?
Hence the blog. It is an experiment of sorts: building a web stack from the ground up. Not because it is a good idea (it is not; unless doing this specifically for learning, or one has some very specific requirement, one's time is spent better elsewhere. These are all solved problems), but because one learns a lot by skipping the layers of abstraction others have built for you and going to the lower layers.
I will continue to post about this as I evolve the software and content and learn more. So far, it is been informative, and clearly we have come a long way since : semantically speaking, HTML5 is far from perfect but does not seem that bad, the DOM model for interacting with hierarchically structured content is interesting, and CSS for separating content from descriptions of how to render that content seems like a generally good idea. The AJAX paradigm seems flawed, but is undeniably powerful, as evidenced by the fact that it allows me to trivially embed source code pretty printers and rendering engines for mathematical typesetting in my pages. As for interactivity, it is also far from perfect, but we finally have the WebSocket protocol (though honestly: it took us 20 years to come up with that? Really?). The session and security stories are still being written, and it seems laughable that we have to deal with things like XSS and CSRF, but at least we have OAuth. And implementing a basic HTTP/1.1 server is not that difficult (my first draft is about 500 lines of Common Lisp, though it has some pretty glaring deficiencies). Anyway, stay tuned. That is the other reason I did this: I have things to say.