Friday, March 18, 2005

Failure oblivious computing

As I continue to labor under the delusion that people find what I have to say interesting, I've decided to restart my (shudder) blog, originally titled "Confessions of an Operating Systems Junkie." (Perhaps in a few decades, when blogs are considered as normal as cell phones or small yappy dogs, I'll leave out the "shudder" part.) First, a quick recap of past episodes... In order to avoid the usual fate of blogs - e.g., random complaining about roommates leaving the dishes in the sink - the main theme of my blog is "Cool systems papers I've read lately," leavened with "Cool other things I've read lately" and very occasionally, "Cool things I've written lately."

While I spent the last few months primarily sleeping on planes and controlling the urge to throttle the sales creature on the other end of the phone, I also read a few systems papers. (Do I sound bitter? Heavens.) Today's cool systems paper is the utterly delightful Failure Oblivious Computing from Martin Rinard at MIT. A safe C compiler creates code that dynamically checks for out-of-bounds memory accesses and terminates the program; this converts, e.g., buffer overflow attacks into mere (?) denial-of-service attacks. Martin wondered what would happen if the compiler instead generated code to transparentally mask bad memory accesses - for a bad read, return some made-up data, for a bad write, silently throw it away. In a lot of cases, the answer is that the program behaves almost as if it had no bug at all, and better than either the safe C compiler case (program termination) or the normal C compiler case (successful security exploit).

Sounds crazy? Read the paper, you'll enjoy it even if you don't agree. Here's a little taste to whet your appetite. In failure-oblivious computing, writes are just thrown away, but how do you decide what value an invalid read should return? In his talk at OSDI 2004, Martin gave the sequence of return values as this: 0, 1, 2, 0, 1, 3, 0, 1, 4... This is because (a) eventually it will cycle through all possible values, allowing things like searches for a particular ASCII character to eventually succeed, and (b) 0 and 1 are the most common data values loaded by programs. This got a big laugh from the audience. In fact, Martin won the unofficial Best Talk award as judged by the Val Henson Laugh-O-Meter. The Laugh-O-Meter was inspired by a talk I gave at the Silicon Valley Linux Users Group a few weeks before. Somehow I managed to make the audience laugh about every 3 minutes while talking about... the history of UNIX file systems. Wild. With any luck, I can repeat the performance at the LUGOD this upcoming Monday night. Imagine what I could do if I were talking about an actually interesting topic!

Martin told me that the only reason he could think up failure oblivious computing was because he hadn't written any code for 10 years. Depressingly, I think he's right. On the other hand, I've only written a couple of test programs and a few scripts over the last 6 months, so perhaps I'm on the road to greatness as well.

If you're reading backwards, you've just hit the end of this blog. I have an earlier blog I wrote while I was at Sun:

Confessions of an Operating Systems Junkie

Hopefully they won't notice I've quit and delete it any time soon.