Littlechef and the great refactoring of the personal infrastructure

One of my favorite DevOps thought leaders thinks I'm nuts. He's right. People should not run their own mail infrastructure, monitoring infrastructure, or DNS infrastructure. On the other hand I kinda started doing it a long time ago and it's worked out pretty well for me, so why should I change?

Well, somewhere around a year or two ago, I started to get annoyed with the state of my infrastructure. See, somewhere about a decade ago, I set up a linode. Because this was really before good DevOps practices or anything, I just SSHed in and started installing stuff. And so I've just kept upgrading Debian versions, etc. And there was this one bad day where I had to spend some unplanned hours fixing a botched upgrade.

There's a bunch of stuff that's just been plain broken. I was using a weird little monitoring tool that's easier to set up compared to Nagios.. but there's a reason why people use Nagios. And the Apache config was something that I set up for Apache 1.x and then forcibly made work with Apache 2.x. HTTPS was via a self-signed cert that wasn't quite set up right. And a bunch of stuff never quite got to the point where it was clean, it was little hacky scripts.

Chef has a big piece of the modern operations mindshare and I've been using it at work. A while ago, I set up a friend's project using chef-solo, which was annoying but I got something she could work with. And then littlechef came out and that actually works a bit better. So I started last year this monomaniacal quest to use littlechef instead. And I got to the point where all of the easy stuff worked (PostgreSQL, nagios, BIND) but then I kinda got busy and didn't really have time to focus on it.

DevOps is in a bit of a transitional state. See, the Chef/Puppet style toolings are there based on the way we all used to work. Servers had roles, software was configured, and you didn't keep spare servers around. This still makes sense for a lot of people. But there's a growth point that some, but not all, projects reach where the individualized attention to each node becomes really annoyingly hard. Eventually you need new kinds of tooling to make this work. When you are building an infrastructure where your software self-configures, where nodes are easily to replicate and installs mean that you tear down the old infrastructure and drop in a new one, you need a new set of tools. You don't need all of the features that Chef has, you can pretty much just playback a short list of commands upon a container.

I was thinking that I was going in the wrong direction.. but I think not. See, Chef was sold on the basis that it would let you scale up to Google-scale. But the Google-scale tools are more like Docker and Ansible. But Chef lets you have an easier time installing things that were never designed for that sort of gooey model. All that complexity doesn't quite make as much sense for the small-to-mid-sized user. So I figured it was actually a worthy endeavor to get littlechef going.

Chef is mostly designed around the idea of a Chef server, either a self-hosted one that requires a considerable amount of infrastructural setup or hosted chef. This works really well when you are routinely bootstrapping servers because you can use cloud-init or a similar system to have your server just use an uncustomized cloud image and then 'phone home' to provision itself. This also works with a larger DevOps staff on a large install.

Littlechef changes this because you can store everything as a 'kitchen' that you then check into git. This is kinda nice in disaster recovery situations because if all of your chef servers go down, you can still do local changes and check them into git and then rectify everything later. This is also nice for running a lightweight infrastructure for your personal stuff.

The real test was when I deleted the server I'd built and set up everything on and then set up a fresh server from scratch, so I could be certain that I knew where everything was.

There were a few timewasters:

First, littlechef uses chef-solo-search to function. The nagios cookbook (one of the reasons why I'm using Chef in the first place) doesn't quite work with chef-solo-search. It's hacked so that it kinda works, but I had to do some extra hacking. As it turns out, because I didn't use the right procedure (fork the nagios cookbook and develop there), the Chef 10 to 11 migration sucked. I think I can get my chef-solo-search in a shape that they can exist as a stable branch atop the Nagios cookbook, but I haven't bothered yet. I still haven't actually pushed my chef-solo-search fixes to the Nagios cookbook, but I they've at least shrunken somewhat.

Second, Apache 2.2 to 2.4. The biggest problem was that the community-driven apache2 cookbook was broken until about a week before I started installing things. Which meant that I spent a bunch of time trying to figure out why it wouldn't install and then used someone's minimum-change fork for a while. And then it also caused some weirdness in my config files.

Third, since I'm using 14.04 LTS instead of Debian oldstable... I ended up with a big Ruby 1.8 to 1.9 porting exercise. That ate up a few hacking sessions.

Fourth, mail setup. I had to take a Postfix + Courier IMAP + Spamassassin config with config files and change it to a Postfix + Dovecot + Spamassassin config. I ended up resorting to the postfix-full cookbook and spent a lot of time figuring out what each config item in my manual config did, and then figured out how to make that work with postfix.

Fifth, because I could do things, I kinda spent more time than just the base "Let's get this to be the same as the old box" tasks.

I've got some stuff to go forward from here. I already pushed my changes to the backup cookbook. I really need to push a few more things publicly, but I have to clean them up first.