Plan 9: Part 1

Whither Plan 9? History and Motivation

Plan 9 is a research operating system from Bell Labs. For several years it was my primary environment and I still use it regularly. Despite it's conceptual and implementation simplicity, I've found that folks often don't immediately understand the system's fundamentals: hence this series of articles.

When I was a young programmer back in high school my primary environment was a workstation of some type running either Unix or VMS and X11. After a while I migrated to FreeBSD on commodity hardware using the same X11 setup I'd built on workstations. But eventually the complexity of Unix in general started to get to me: it happened when they added periodic(8) to FreeBSD in one of the 4.x releases. "Really?" I thought to myself. "What's wrong with cron(8) and making a crontab?" Unix-like systems were evolving in a way that I didn't like and I realized it was time for me to find another home.

And I wasn't the only one who had ever felt that way. It turns out that circa 1985 the 1127 research group at Bell Labs, the same group that developed Unix and C after Bell Labs pulled out of the Multics project, came to the conclusion that they'd taken Unix about as far as they could as a research vehicle.

They were looking at the technological landscape of the 1980s and realized that the computing world was fundamentally changing.

First, high-bandwidth low-latency local area networks were becoming ubiquitous.

Second, large time-shared systems were being replaced by networks of heterogeneous workstations built from commodity hardware. Related, people were now using machines that had high-resolution bitmapped graphics displays accompanied by mice instead of text-only, keyboard-only character terminals.

Third, RISC processors were on the rise and multiprocessor RISC machines were dramatically outperforming their earlier uniprocessor CISC ancestors.

Finally, they saw major changes in storage systems: RAID was gaining traction, tape drives were waning, and optical storage was looking like it would be a big part of the future. (Of note, this is one area where they were arguably very, very wrong. But no one is truly prescient.)

At first, they tried to adapt Unix to this new world, but they quickly decided this was unworkable. What they wanted was a Unix built on top of the network; what they found was a network of small Unix systems, each unique and incompatible with the rest. Instead of a modern nation state, they had a loose federation of feudal city states.

It turned out that fundamental design assumptions in their earlier system made it difficult to gracefully accommodate their desired changes. For example, the concept of a single privileged 'root' user made it difficult to extend the system to a network of machines: does having 'root' access on one machine confer it on all machines? Why or why not? Here, an artifact of a different time was at odds with the new reality. Similarly, graphics had never been integrated into Unix well: the system was fundamentally built around the idea of the TTY as the unit of user interaction and the TTY abstraction permeated the kernel. Also, assuming a uniprocessor machine was inherent; fine-grained locking for scalability on multiprocessor systems was simply non-existent. Finally, the filesystem organization made it challenging to support heterogeneous systems in a coherent manner: something as simple as having binaries for multiple processor types resident on a single installation simultaneously, as one would on a file server serving heterogeneous machines, could not be done elegantly.

In the end, the amount of work required to bring Unix up to date was considered not worth the effort. So they decided to start from scratch and build a new system from the ground up: this system would became Plan 9.

Plan 9 Fundamentals

To a first-order approximation, the idea behind Plan 9 is to build a Unix-like timesharing system from the network, rather than a network of loosely connected time-sharing Unixes.

To start at the most basic level, a Plan 9 system is a network of computers that are divided into three classes:

File Servers
This is where your data lives. They provide stable storage to the network.

These are machines with lots of fast secondary storage (hard disks or RAID arrays or SSDs or whatever. Historically speaking this meant RAID arrays built from hard disks: Plan 9 predates SSDs and other commodity-class solid state storage technologies).

File server machines have decent if not spectacular processors, moderate amounts of RAM for caching data from secondary storage, and a very fast network connection.

They have no user interaction capabilities to speak of: often one would use a serial console for day-to-day system administration tasks. Historically, the file server machine ran a special version of the kernel and didn't even have a shell! Rather, there was something akin to a monitor built-in where the system administrator executed commands to configure the system, add and remove users and other similar tasks.

More recently, the file server was rewritten so that it runs as a user-level program executing under the control of a normal kernel. It is often still run on a dedicated machine, however.

An unusual innovation at the time was the backup mechanism: this was built into the file server. Periodically, all modified blocks on the file server would be written off to a tertiary storage device (historically, a magneto-optical jukebox, but now a separate archival service that stores data on a dedicated RAID array). Of note, historically file service was suspended while the set of modified blocks was enumerated, a process that could take on the order of minutes. Now, the file system is essentially marked copy-on-write while backups are happening with no interruption in service.

CPU Servers
Shared compute resources.

These are large multiprocessor machines with lots of of fast CPUs and lots of RAM. They have a very fast network connection to the file server but rarely have stable storage of their own (read: they are often diskless, except for occasionally having locally attached storage for scratch space to cut down on network traffic).

Like file servers, there is no real user-interaction hardware attached to the computer itself: the idea is that you will interact with a CPU server through a Plan 9 terminal (discussed below). Often console access for system administration was provided through a serial line.

These run a standard Plan 9 kernel, but compiled using a "cpu" configuration. This mostly affects how resources are partitioned between user processes and the kernel (e.g., buffers reserved by the kernel and the like). The modern file server typically runs on a CPU server kernel.

Terminals
The machines a user sits in front of and interacts with.

Terminals have mediocre amounts RAM and CPU power and middling network interfaces but excellent user-interface features including a nice keyboard, nice 3-button mouse, and a nice high resolution bitmapped display with a large monitor. They are usually diskless.

This is where the user actually interacts with the system: the terminal is a real computer, capable of running arbitrary programs locally, subject to RAM and other resource limitations. In particular, the user runs the window system program on the terminal as well as programs like text editors, mail clients, and the usual compliment of filesystem traversal and manipulation commands. Users would often run compilers and other applications locally as well.

The terminal, however, is not meant to be a particularly powerful computer. When the user needs more computational power, she is expected to use a CPU server.

A user initiates a session with a Plan 9 network by booting a terminal machine. Once the kernel comes up, it prompts the user for her credentials: a login name and password. These are verified against an authentication server — a program running somewhere on the network that has access to a database of secrets shared with the users. After successful authentication, the user becomes the "hostowner", the terminal connects to the CPU server, constructs an initial namespace and starts an interactive shell. That shell typically sources a profile file that further customizes the namespace and starts the window system. At this point, the user can interact with the entire network.

Modernization

A question that immediately arises from this description: why write a new kernel for this? Why not just implement these things as separate user-processes on a mature Unix kernel?

Over the course of its research lifetime, Unix had acquired a number of barnacles that were difficult to remove. Assumptions about the machine environment it was developed on were fundamental: TTYs were a foundational abstraction. Neither networking nor graphics had ever really been integrated gracefully. And finally it was fundamentally oriented towards uniprocessor CISC machines.

With Plan 9, the opportunity was taken to fix the various deficiencies listed in the motivation section. In particular, fine-grained locking was added to protect invariants on kernel data structures. The TTY abstraction, which was already an anachronism in the 1970s, was discarded completely: effective use of the system now required a bitmapped graphical display and a mouse. The kernel was generally slimmed down and the vestiges of various experiments that didn't pan out, or design decisions that were otherwise obsolete or generally bad, were removed or replaced.

Device interfaces were rethought and replaced. Networking and graphics were designed in from the start. The security model was rethought for this new world.

The result was a significantly more modern and portable kernel that could target far more hardware than Research Unix could. Unburdened by the legacy of the past, the system could evolve more cleanly in the new computing environment. Ultimately, the same kernel would target MIPS, SPARC, Alpha, x86 and x86_64, ARM, MC68k, PowerPC and i960: all without a single #ifdef.

The userspace programs that one had come to expect were also cleaned up. Programs that seemingly made no sense in the new world were not carried forward: things dealing with the TTY, for example, were left behind. The window system was rewritten from scratch to take advantage of the network, various warts on programs were removed and things were generally polished. New editors were written or polished for the new system, and the new UNICODE standard for internationalization was embraced through the freshly-designed UTF-8 encoding, which was introduced to the world through Plan 9.

On the development front, a new compiler/assembler/linker suite was written which made cross-compilation trivial and made development of a single system across heterogeneous hardware vastly easier (dramatically increasing system portability), and some experimental features added to the C programming language to support Plan 9 development. The standard libraries were rethought and rewritten with a new formatted-printing library, standard functions, system calls, etc. Threads were facilitated through the introduction of an rfork primitive that could create new processes that shared address spaces (but not stacks).

But what about root?

Plan 9 circumvents the "what about root?" question by simply doing away with the concept: there is no super-user. Instead, an ordinary user is designated as the "hostowner" of any particular computer. This user "owns" the hardware resources of the machine but is otherwise subject to the normal permissions authorized scheme users are familiar with from Unix: user, group and other permissions for read, write and execute.

All machines have hostowners: for terminals this is whoever logged into the machine when the terminal booted. For CPU and file servers, these are configured by the system administrator and stored in some sort of non-volatile memory on the computer itself (e.g., NVRAM).

On CPU servers, the hostowner can create processes and change their owner to some other user. This allows a CPU server to support multiple users simultaneously. But the hostowner cannot bypass filesystem permissions to inspect a user's read-protected files.

This begs the question: if there is no super-user, how are resources put into places where the user expects them, and how does the user communicate with the system? The answer is per-process, mutable namespaces.

Namespaces and resource sharing

One of the, if not the, greatest advances of Plan 9 was an aggressive adaptation and generalization of the Unix "everything is a file" philosophy. On Unix "everything" is a file — a named stream of bytes — except when it's not: for instance sockets kinda-sorta look like files but they live in a separate namespace than other file-like objects (which have familiar names, like /dev/console or /etc/motd). One does not manipulate them using the "standard" system calls like open(2), creat(2), etc. One cannot use standard filesystem tools like ls(1), cat(1), or grep(1) on sockets since they aren't visible in the file namespace (okay, you kinda-sorta can with Unix domain sockets, but even then there are pretty serious limitations). Or consider the venerable ioctl(2) system call: this is basically a hook for manipulating devices in some way; the device itself may be represented by a device node in /dev, but controlling that device uses this weird in-band mechanism; it's a hack.

But on Plan 9, everything looks like a file. Or more precisely everything is a filesystem and there is a single protocol (called 9P) for interacting with those filesystems. Most devices are implemented as a small tree of files including data files for getting access to the data associated with a device as well as a ctl (nee "control") file for controlling the device, setting its characteristics and so forth. ioctl(2) is gone.

Consider interacting with a UART controlling a serial port. The UART driver provides a tree that contains a data file for sending and receiving data over the serial port, as in Unix, but also a control file. Suppose one wants to set the line rate on a serial port, one does so by echoing a string into the control file. Similarly, one can put an ethernet interface into full-duplex mode via the same mechanism. Generalizing the mechanism so that reading and writing a text file applies to device control obsoletes ioctl(2) and other similar mechanisms: the TCP/IP stack is a filesystem, so setting options on a TCP connection can also be done by echoing a command into a ctl file.

Further, the system allows process groups to have independent namespaces: some process may have a particular set of resources, represented as filesystems, mounted into its namespace while another process may have another set of resources mounted into a different namespace. These can be inherited and changed, and things can be 'bound' into different parts of the namespace using a "bind" primitive, which is kind of like mounting an existing subtree onto a new mount point, except that one can create 'union' mounts that share with whatever was already under that mount point. Further, bindings can be ordered so that one comes before or after another, a facility used by the shell: basically, the only thing in $path on Plan 9 is /bin, which is usually a union of all the various bin directories the user cares about (e.g., the system's architecture-specific bin, the user's personal bin, one just for shell scripts, etc). Note that bind nearly replaces the need for symbolic links; if I want to create a new name for something, I simply bind it.

All mounts and binds are handled by something in the kernel called the "mount driver," and as long as a program can speak 9P on a file descriptor, the resources it exposes can be mounted into a namespace, bound into nearly arbitrary configurations, and manipulated using the standard complement of commands.

Since 9P is a protocol it can be carried over the network, allowing one access to remote resources. One mounts the resource into one's namespace and binds it where one wishes. This is how networked graphics are implemented: there's no need for a separate protocol like X11, as one simply connects to a remote machine, imports the "draw" device (the filesystem for dealing with the graphic's hardware) from one's terminal, binds that over /dev/draw (and similarly with the keyboard and mouse, which are of course represented similarly), and runs a graphical program, which opens /dev/draw and writes to it to manipulate the display. Further, all of the authentication and encryption of the network connection is handled by whatever provides the network connection; authorization for opening files is handled by controlling access to the namespace, and the usual Unix-style permissions for owner, group and world. There's no need for MIT-MAGIC-COOKIE-1's or tunneling over SSH or other such application-level support: you get all of it for free.

Also, since 9P is just a protocol, it is not tied to devices: any program that can read and write 9P can provide some service. Again, the window system is implemented as a fileserver: individual windows provide their own /dev/keyboard, /dev/mouse and /dev/draw. Note that this implies that the window system can run itself recursively, which is great if you're testing a new version of the window system. As mentioned before, even the TCP/IP stack is a filesystem.

Finally since mounts and binds are per-process, both operations are unprivileged: users can arbitrarily mount and bind things as they like and subject to the permissions of the resources themselves. Of course, Plan 9 does rely on some established conventions and programs might make corresponding assumptions about the shape of the namespace so it's not exactly arbitrary in practice but the mechanism is inherently flexible.

We can see how this simplifies the system by comparing Plan 9's console mechanism to /dev/tty under Unix. Under Plan 9, each process can have its own /dev/cons (taken from the namespace the process was started in) for interacting with the "console": it's not a special case requiring explicit handling in the kernel as /dev/tty is under Unix, it's simply private to the namespace. Indeed, under the rio window system, each window has it's own /dev/cons: these are synthesized by the window system itself and used to multiplex the /dev/cons that was in the namespace rio was started in.

Note how this changes the view of the network from the user's perspective in contrast to e.g. Unix or VMS: I construct the set of resources I wish to manipulate and import them into my namespace: in this sense, they become an extension of my machine. This is in stark to other systems in which resources are remotely accessed: I have to carry my use to them. For example, suppose I want to access the serial port of some remote computer: perhaps it is connected to some embedded device I want to manipulate. I do this by importing the serial port driver, via 9P, from the machine the device is connected to. I then run some kind of communications program locally, on my terminal, connecting to the resource as if it were local to my computer. 9P and the namespace abstraction make this transparent to me. Under Unix, by contrast, I'd have to login to the remote machine and run the communications program there. This is the resource sharing model, as opposed to the remote access to resources model.

However, I still can have access to remote resources. Consider CPU servers: to make use of a CPU server's resources, I run a command on my terminal called cpu which connects me to a remote machine. This is superficially similar to a remote login program such as ssh with the critical difference that cpu imports my existing namespace from my terminal, via 9P, and makes it accessible to me on the remote machine. Everything on the remote machine is done within the context of the namespace I set up for myself locally before accessing the CPU server. So when I run a graphical program, and it opens /dev/draw this is really the /dev/draw from my terminal. It is imperfect in that it relies on well-established convention, but in practice it works brilliantly.

The file server revisited

The file server is worth another look, both as an interesting artifact in its own right as well as an example of an early component of the system that did not pan out as envisioned at the outset of the project.

In the first through third editions of Plan 9 the file server machine ran a special kernel that had the filesystem built in. This was a traditional block-based filesystem and the blocks were durably kept on a magneto-optical WORM jukebox. In fact, the WORM actually held the filesystem structure; magnetic disk was a cache for data resident on the worm and could be discarded and reinitialized. The WORM was treated as being infinite (not true of course, but it was regarded so conceptually). Since changing platters was necessarily slow and magneto-optical drives weren't exactly "fast", there was a disc acting as a cache of frequently-used blocks as well as a write buffer. RAM on the file server machine also acted as a read cache for blocks on the hard disk, giving two layers of caching: generally, the working set of commonly used user programs and so forth all fit into the RAM cache. The overview paper describing the system stated that something less than one percent of accesses missed the cache and had to go to the WORM.

To avoid wasting write-once space and for performance, writes were buffered on disk and automatically synced to the WORM once a day: at 5am file service was paused and all blocks modified since the last dump were enumerated and queued for copy. Once queued, file service resumed. Those blocks were then written to newly allocated blocks on some platter(s) by a background process. The resulting daily "dump" was recorded with a known name and made accessible as a mountable filesystem (via 9P). Thus, one could 'cd' to a particular dump and see a snapshot of the filesystem as it existed at that moment in time. This was interesting since, unlike using tape backups on Unix, if you lost a file you didn't need anyone to go read it back for you; you simply cded to where it was and used cp to copy it back to the active filesystem. Similarly if you wanted to try building a program with an older version of a library, you could simply bind the older version from the dump onto the library's name and build your program; the linker would automatically use the older library version because that's what was bound to the name it expected in its namespace. There were some helper commands for looking for a file in the dump and so forth to make navigating the dump easier.

A few groups outside of Bell Labs actually had the magneto-optical jukeboxes, but they were rare. However the file server could be configured to use a hard disk as a "pseudo-worm": that is, the file server could treat a disk or a disk mirror like a WORM even though it wasn't truly write-once at the hardware level. Most sites outside of the labs were configured to use the pseudo-worm.

In the 4th edition a new associative block storage server called Venti appeared. Venti isn't a WORM jukebox; it's an associatively-indexed archival storage server. Data is stored in fixed-sized blocks that are allocated from storage "arenas": when a user writes data to a venti server the data is split into blocks, the SHA-1 signature of the block is calculated, a block of backing store is allocated from an arena, the data is written there, and the mapping between signature and <arena, block address> pair is written into an index. If one wants the block back one looks up its signature in the index to get the <arena, block address> pair back and then reads that block from the arena. Naturally, this means that duplicate data is stored only once in the venti. However, venti arenas can be replicated for speed and/or reliability.

Arenas are sized such that they can be written onto some kind of archival media (my vague recollection is that DVDs may have been popular at the time), but they are stored on hard disks or some other kind of random-access media (SSDs are popular now). Venti, however, is not a file server and does not present itself as one. Rather, it speaks its own protocol and likely originated out of the observation that magneto-optical jukeboxes had never quite taken off the way they had initially expected, were expensive, slow, big, noisy and power-hungry. Hard disks were getting so cheap that they were about to pass tape in storage density versus cost and with RAID they were pretty reliable.

A filesystem called "fossil" was written that could be optionally backed by a venti, but it was rather a different beast than the old file server. In particular, fossil is just a normal user program that one can run under a normal Plan 9 kernel (unlike the older file server, which really was a self-contained program). And unlike the older filesystem which lived implicitly on the WORM, fossil has to explicitly maintain state about the associative store in order to be able to reconstruct the filesystem structure from the venti. Regardless, it shares many of the traits of the earlier system and was clearly influenced by it: there is a dump that is accessed in the exact same way as the older server's dump (including the naming scheme) and backups are automatically integrated in the same way, but using a copy-on-write scheme instead of suspending service when snapshotting. The implementation is radically different, however.

Sounds great; so where is it now?

Sadly, Plan 9 has fallen into disuse over the past decade and the system as a whole has atrophied. For example, it has been argued that fossil never attained the level of maturity, reliability, or polish of the older filesystem and that is largely a fair assessment. I will discuss this more in part 3 of this series.

Plan 9 is still available today, though it is not actively developed by Bell Labs anymore. The Labs produced four official Plan 9 editions; the last in 2003, after which they moved to a rolling release without fixed editions. However, the Plan 9 group at Bell Labs disbanded several years ago. There are several forks that have arisen to take up some of the slack:

Further, many of the good ideas in Plan 9 have been brought into other systems. The Akaros operating system has imported not just many of the ideas, but much of the code as well. Even systems like Linux and FreeBSD have taken many of the good ideas from Plan 9: the /proc filesystem on both systems is inspired by Plan 9, and Linux has implemented a form of per-process namespaces. FUSE is reminiscent of Plan 9's userspace filesystem support.