Maildir... the one true mail format (more or less)

So I just want to highlight one very important point that I’ve noticed while moving my mail around.

When I was using an mbox-based mail system, every so often I would have a mail corruption bug where “something” would go wrong. My Outlook email box has gone funky in the past and not all of the data is stored in an especially well documented place either.

However, ever since I’ve switched to Maildir, I’ve had no problems, ever, with my maildir. I think part of it is the levels of testability. See, if you use the filesystem, whoever wrote it had to sit down and understand all of the usual failure cases and handle as best as possible and generally protect the filesystem from software bugs as much as possible. There’s a single API to test for Maildir and it’s also the same API that the rest of the filesystem uses. Wheras with mbox, everybody’s got their own API. So if you screw things up and write garbage to an open file handle, the most you are going to do is corrupt a single file. Whereas if you try that on a mbox, it’s going to ruin a bunch of messages… pretty much disrupting the whole stream of the file and potentially requiring some hand mangling.

I suppose this could have been a little more carefully defined, but you do need to remember that the Unix market has spent a long time fragmented with subtly different versions and multiple competing correct ways to do things. So there’s actually four different ways to properly lock a mbox mail box, two ways to split up the file into messages, and a bunch of subvariants of the format. And if you actually want to make the format fast, you need to add summary files to index things, which is not a shared standard between applications either.

Either way, this goes back to the whole database problem. Clearly a relational database is a very very good thing for solving relational problems. But if you start trying to store data in relational database that is never in a million years going to take advantage of relational features just because the filesystem’s too slow or not quite ACID, you need to start asking why the filesystem sucks so much instead of using bad minimalist databases.


Posted:

Updated: