Live Destructive Database Migrations: Doing the Two Step

By "destructive", I mean "removes or renames a column in use by the current application". By "live", I mean "while your application is in use".

Original Migration

If Rails' ActiveRecord notices that a table has the columns created_at and updated_at it will automatically populate or change their values on model creation or modification. Our application doesn't take advantage of this: it uses a hand-rolled creation_date field. I'm going to change this column to created_at and add an updated_at column.

If I were doing this via downtime and maintenance page the migration would look like this:

class AddCreatedUpdatedColumns < ActiveRecord::Migration
  def up
    %w{all the tables}.each do |table|
      change_column table, :creation_date, :created_at, :datetime, :null => false
      add_column table, :updated_at, :datetime, :null => false

      connection.execute("UPDATE #{table} SET updated_at = created_at")
    end
  end

  def down
    %w{all the tables}.each do |table|
      change_column table, :created_at, :creation_date, :datetime, :null => false
      drop_column table, :updated_at
    end
  end
end

This migration couldn't be run on a live system. Between the time that the schema begins to be changed and the application restarts these errors would occur:

  • ActiveRecord would try to set values on a creation_date column that no longer exists
  • Scopes that have order(:creation_date) stanzas would fail to load entirely
  • ActiveRecord would perform SELECT statements containing the creation_date column

Do the Two-Step

To work around this I'm going to split the above migration and prepare for multiple deployments:

  • The first deployment will create new created_at and updated_at columns, deploy code to use these new columns, and tell ActiveRecord to ignore the old creation_date column.

  • The second deployment will remove the now unused creation_date column.

Simple, yeah? I let our running application use the new column name before I remove the old one.

First Migration

I create a new migration to add the new columns:

class AddCreatedUpdatedColums < ActiveRecord::Migration
  def up
    add_column :widgets, :created_at, :datetime, "DEFAULT now() NOT NULL"
    add_column :widgets, :updated_at, :datetime, "DEFAULT now() NOT NULL"
    change_column :widgets, :creation_date, :datetime, :null => true

    connection.execute("UPDATE wisgets SET created_at = creation_date, updated_at = creation_date")
  end

  def down
    change_column :widgets, :creation_date, :datetime, :null => false
    drop_column :widgets, :created_at
    drop_column :widgets, :updated_at

  end
end

I add code to the Widget model to ignore the now unused creation_date column:

class Widget << ActiveRecord::Base

  def self.columns
    super.reject {|c| %w{creation_date}.include? c.name.to_s }
  end
end

Once I've run rake db:migrate on my development box I'll run a global find and replace all instances of creation_date with created_at. I'll then run our test suite and commit.

I'll push that out to our test system, play around with some records to make sure that created_at and updated_at get set properly, and then push it to production.

Second Migration

I created a second migration to set the correct flags on the columns (which couldn't be set earlier due to missing data) and remove the now unused column:

class RemoveCreationDate < ActiveRecord::Migration
  def up
    %w{some tables}.each do |table|
      change_column table, :created_at, :datetime, :null => false
      change_column table, :updated_at, :datetime, :null => false
      remove_column table, :creation_date
    end
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

I also removed the code in the Widget model that tells it to ignore the creation_date column.

Note the use of IrreversibleMigration above. I prefer to flag that a migration can't be reversed this way rather than leave a down method that provides a false positive. I will also remove the DEFAULT now(): ActiveRecord will always set these values, any other behaviour should be considered an error.

This migration gets pushed to our test system only when the first migration and deploy has successfully run on both our test and production systems and all automated tests have passed.

Gotchas

The code above has been adapted based on a couple of things that went wrong during pre-production testing:

  • The created columns were set to :null => true rather than having intelligent defaults. This gave us some grief after the first migration was run: some code depended on creation_date & created_at not being nil.
  • I'd forgotten to change creation_date to allow nulls in the first migration. This meant that once the new code was deployed records were being saved without a creation_date, causing MySQL to return a column 'creation_date' cannot be NULL failure.
  • The original version of this post didn't anticipate that ActiveRecord would try to select the creation_date column explicitly. This meant that errors occurred between the time that the database migration was run and the services were restarted. Thanks to Ben Hoskings for advice on how to correct htis.

I also had to modify an existing index and manually re-create a couple of views. Check your db/schema.rb diff after you run rake migrate on your own machine before pushing.

Babushka, Day Two

Add logging to timezone set

I'd like to know that we've actually set the timezone correctly, and Babushka's console output is lovely. Let's add to that:

dep "timezone set", :timezone do
  met? {
    current_tz = shell("date +%Z")
    log "system timezone is #{current_tz}"
    current_tz == timezone
  }
  meet {
    '/etc/timezone'.p.write(timezone)
    shell "dpkg-reconfigure --frontend noninteractive tzdata"
  }
end
root@precise64:~# babushka "personal:timezone set" timezone=UTC
timezone set {
  System timezone is EST
  meet {
  }
  System timezone is UTC
} ✓ timezone set

Remove AppArmor

I dislike AppArmor. I find it gets in the way more than it assists. Let's nuke it.

Is it installed by default in Ubuntu 12.04?

root@precise64:~# dpkg -l | grep apparmor
ii  apparmor                        2.7.102-0ubuntu3           User-space parser utility for AppArmor

Yep. Okay, how do we work out whether a package is installed? Diving into the babushka source shows us PkgHelper:

dep "remove apparmor" do
  met? { ! Babushka.host.pkg_helper.has?('apparmor')}
end
remove apparmor {
  ✓ system has apparmor deb
  meet {
  }
  ✓ system has apparmor deb
} ✗ remove apparmor

PkgHelper unfortunaely has no "remove package" function. We'll shell out instead:

dep "remove apparmor" do
  met? { ! Babushka.host.pkg_helper.has?('apparmor')}
  meet {
    shell "apt-get remove -y apparmor"
  }
end
root@precise64:~# babushka "personal:remove apparmor"
remove apparmor {
  ✓ system has apparmor deb
  meet {
  }
  system doesn't have apparmor deb
} ✓ remove apparmor

Make sure we have our standard utility suite

Our chef base cookbook installs a bunch of packages:

# Packages
%w{
  build-essential binutils-doc autoconf flex bison
  vim curl git-core sysstat libxml2-dev libxslt1-dev
}.each do |p|
  package p do
    action :install
  end
end

Unfortunately, I can't remember exactly why some of these got installed. A couple of them are for Ruby, but we'll install those later. I'm going to filter the above list and use babushka's .bin template:

dep 'vim.bin'
dep 'git.bin'
dep 'curl.bin'
dep 'iostat.bin' do
  installs 'sysstat'
end

dep "standard binaries" do
  requires %w{vim.bin git.bin curl.bin iostat.bin}
end

Note the installs 'sysstat': the .bin template checks for the existence of the binary and installs the package of the same name if it doesn't exist. This won't work for iostat, which comes from the sysstat package.

And run:

root@precise64:~# babushka  "personal:standard binaries"
standard binaries {
  vim.bin {
    'vim' runs from /usr/bin.
  } ✓ vim.bin
  git.bin {
    'git' runs from /usr/bin.
  } ✓ git.bin
  curl.bin {
    'curl' runs from /usr/bin.
  } ✓ curl.bin
  iostat.bin {
    'iostat' is missing.
    apt {
      package manager {
        'apt-get' runs from /usr/bin.
      } ✓ package manager
      apt source {
      } ✓ apt source
      apt source {
      } ✓ apt source
    } ✓ apt
    meet {
      Installing sysstat via apt... done.
    }
    'iostat' runs from /usr/bin.
  } ✓ iostat.bin
} ✓ standard binaries

Gold.

NTP

We ensure that NTP is installed and talking to some Australian time servers to keep our clocks in sync. Our chef cookbook looks like:

package 'ntp' # Install ntp package

service 'ntp' do # Ensure that ntp starts on boot
  action :enable
end

cookbook_file "/etc/ntp.conf" do # Tell NTP to use .au time servers
  mode 0644
  notifies :restart, "service[ntp]"
end

Skipping right to the end, here's the babushka version I ended up settling on:

dep 'ntpd.bin' do
  installs 'ntp'
end

dep "ntp time synchronisation" do
  requires 'ntpd.bin'
  met? { Babushka::Renderable.new("/etc/ntp.conf").from?(dependency.load_path.parent / "ntp/ntp.conf") }
  meet {
    render_erb 'ntp/ntp.conf', :to => '/etc/ntp.conf'
    shell "service ntp restart"
    sleep 1 # wait for ntp to come back up
  }
end

This originally had a ntpq -p test to make sure NTP was using the time servers we asked it to. Unfortunately, we use the #.oceania.pool.ntp.org aliases, and ntpq -p would only output the servers they actually CNAMEd to. An alternative would be to test for latency, but that's too much effort for a small requirement like this.

The Babushka::Renderable stuff is stolen from The Conversation's babushka-deps. The met? condition fails if the rendered template would not match the template currently present.

Personally, I'd be tempted to make this a first-class conditional, but I understand that's my chef and puppet background at play. I'm assuming that it's intentionally cumbersome to encourage people to check the actual outcomes of actions, not whether a particular file matches. Of course, there's cases where file matching is desirable: I don't want to manually have to trigger a meet condition if I update a webserver template, for example.

Thoughts

I'm ashamed to admit it, but I'm nearly at the point where I'll need to shelve this project and continue the server rework using chef. Admittedly my expectation that I'd be able to duplicate a multi-cookbook configuration in babushka within a weekend was fairly optimistic, but I've run in to a couple of things that have made the process a bit more difficult than I'd anticipated.

I'd assumed at the outset that babushka deps were written in much the same way as puppet manifests or chef cookbooks: "I expect this package to be installed, I expect this file to look like this". Features and templates seem to have been deliberately ommitted from babushka to encourage a behavorial (rather than merely stateful) approach to depedency creation. While interesting, this has implications for idempotency that I'd taken for granted in chef and puppet.

There's little documentation "in the wild" about the best way to structure certain things in babushka. Most of my understanding has been from reading The Conversation's deps. Based on conversations with Ben, I'm beginning to assume that some of the patterns in use there are soon to be deprecated or are non-ideal. Lacking a shared understanding of the right way to do things (which I feel is necessary when learning a new technology: just see how most people ended up using Cucumber!), I'm left wondering whether I'm actually spending all my time heading down the wrong path.

However, RailsCamp 12 is coming up next month, and I've been looking for a project to hack on. I'm considering spending some time adding creating a 'stdlib' of deps (borrowing from the chef resources) to make migration easier and maybe even work on the documentation.

Babushka, Day One

BikeExchange are moving from a Ubuntu 10.04 virtual machine to a Ubuntu 12.04 bare metal behemoth. We have existing chef cookbooks that would make this easier, but:

  • I'm the only person in the organisation who knows how they work
  • We use maybe 25% of chef and the other 75% gets in the way
  • I need to rewrite big chunks from scratch to properly support TiniTrader

A discussion about ansible on #roro today led me to think about giving that a try instead. Simple YAML files! No daemons! No fucking keysigning! Party time!

Then, Andy Snow said (I'm paraphrasing) "You're a pussy, you should give babushka a go instead."

So that this knowledge isn't lost, and as the babushka documention is pretty mixed (think Rails 3 before all the guides got re-written), I'm documenting my experiences converting our chef cookbooks from scratch.

Bootstrapping Vagrant

Vagrant doesn't support babushka out of the box, like puppet or chef. I prefer that: while I'm creating these things I like to work with them live rather than have vagrant do the heavy lifting for me.

My first step was to update VirtualBox and vagrant. Easy.

I grabbed an official Ubuntu 12.04 Precise Pangolin 64-bit image.

Installed babushka, following Ben's instructions:

vagrant ssh
sudo apt-get install curl
sudo bash -c "`curl https://babushka.me/up`"
exit

I'll be working on my babushka configuration right in my repository, so I set up a symlink between that and where babushka expects to find its configuration:

vagrant ssh
sudo bash
cd /root/.babushka
ln -si /vagrant/babushka-deps/ deps

So that I didn't need to repeat these steps if I wanted to start from a 'fresh' machine, I packaged what I had as a new vagrant box:

vagrant package
vagrant box add precise64-babushka package.box
# edit Vagrantfile, replace box with precise64-babushka
rm package.box # (don't need it no more)
vagrant destroy # (as above)
vagrant up

Proof of Concept Babushka Dep

Babushka works on a concept of 'deps'. You specify a condition and how to meet that condition. It's got more in common with puppet than chef, and more in common with TDD than both of those.

The very first step in our chef configuration is to lock the timezone to UTC. Let's try writing a dep that does just that:

dep "timezone set" do
  met? { shell("date +%Z") == "UTC" }
end

And run it:

root@precise64:/root/.babushka/deps# babushka 'personal:timezone set to UTC'
timezone set to UTC {
} ✓ timezone set to UTC

Looks like this box is already already UTC! Okay, let's paramaterize it and use EST:

dep "timezone set", :timezone do
  met? { shell("date +%Z") == timezone }
end

Much better:

root@precise64:/root/.babushka/deps# babushka 'personal:timezone set' timezone=EST
timezone set {
  meet {
  }
} ✗ timezone set
You can view a more detailed log at '/root/.babushka/logs/timezone set'.

root@precise64:/root/.babushka/deps# babushka 'personal:timezone set' timezone=UTC
timezone set {
} ✓ timezone set

Finally, we'll grab the commands from our chef config and set the timezone:

dep "timezone set", :timezone do
  met? { shell("date +%Z") == timezone }
  meet {
    '/etc/timezone'.p.write(timezone)
    shell "dpkg-reconfigure --frontend noninteractive tzdata"
  }
end
root@precise64:/root/.babushka/deps# babushka 'personal:timezone set' timezone=EST
timezone set {
  meet {
  }
} ✓ timezone set

root@precise64:/root/.babushka/deps# date
Thu Sep 27 06:59:47 EST 2012

Additional Resources

I found the following helpful:

Reducing Asset Precompilation Time

We deploy to our test environment generally between five and twenty times a day. As we like our test environment to provide a solid indicator of production-readiness, we try to match our test environment as close to production as we can.

This, of course, means using the same Rails asset precompilation on a test deploy as on a production deploy. Unfortunately, this can take a while:

$ time rake assets:precompile
...
real  4m14.866s
user  5m52.794s
sys   0m31.324s

Last week I found, fixed and resolved a bug in the time it took for a deployment to complete. This annoyed me, so I set out to see if I could reduce our asset precompilation time, and by extension our deployment time.

First I look in rails/actionpack/lib/sprockets/assets.rake and find one huge smell: the compilation is run twice! It's run once for assets, appending a digest path, and then again without a digest path.

Doing a bit of digging on this I find this isn't necessary for our case: everything in public/assets is called using one of the asset helpers, which will read manifest.yml and determine the correct (digested) path for the asset. Unless we start using assets from a 'static' source (eg, from an email) we're safe to turn off the 'nondigest' run. In fact, it looks like there's a movement to make this the default.

Unfortunately, running rake assets:precompile:primary gives us an error about a missing bootstrap-dropdown.js. Investigating further I find that rake assets:precompile does some set up for us. It:

  • sets the Rails environment to production,
  • ensures that Bundler loads the assets group and
  • re-invokes rake to apply these changes.

Appending RAILS_ENV=production RAILS_GROUPS=assets to our rake assets:precompile:primary command fixes the missing file error.

This halves our precompilation time:

real  2m0.908s
user  2m51.986s
sys   0m14.973s

Now, let's see where Sprockets is spending all of its time. I overrode the Sprockets::StaticCompiler#compile method to print out how long Sprockets spent with each file:

if ENV['TRACK_PRECOMPILE_ASSETS']
  module Sprockets
    class StaticCompiler
      def compile
        manifest = {}
        start_process = Time.now
        total_time_taken = 0
        env.each_logical_path do |logical_path|
          next unless compile_path?(logical_path)
          start_file = Time.now
          if asset = env.find_asset(logical_path)
            post_find = Time.now
            manifest[logical_path] = write_asset(asset)
            post_write = Time.now
            time_taken = post_write - start_file
            percent_in_write = (post_write - post_find) / time_taken
            total_time_taken += time_taken
            time_taken_s = sprintf "%0.2f", time_taken * 1000
            percent_in_write_s = sprintf "%0.2f", percent_in_write * 100
            $stderr.puts "#{logical_path}: #{time_taken_s} #{percent_in_write_s}%"
          end
        end
        write_manifest(manifest) if @manifest
        $stderr.puts "TOTAL: #{Time.now - start_process} #{total_time_taken} "
      end
    end
  end
end

Messy as hell, but it's done the job for me.

It showed me something interesting: each file that actually needs precompilation (rather than simply copying, as is the case for images) takes between 0.5 seconds and 3 seconds depending on how complex it is.

The gem ckeditor_rails has about 100 or so Javascript files that are being individually compiled. Removing this gem reduces our precompilation time significantly:

real  0m35.728s
user  0m45.032s
sys   0m2.792s

So, I've taken the time from four minutes to thirty seconds. Let's see how that affects our deployment time.

Firstly, we'll set a baseline by running chef without a pending deploy:

real  0m24.555s
user  0m1.940s
sys   0m0.560s

Next a deploy without the above optimisations:

real  4m6.257s
user  3m46.870s
sys   0m24.190s

Finally, use asset:precompile:primary and remove ckeditor_rails: real 1m30.535s user 0m49.960s sys 0m5.690s

Four minutes down a much more reasonable one minute thirty! There's still work to be done, but this is much better than we started with.

Discover Un-Pushed git Repositories

A small script to discover any repositories in the current directory with an un-pushed current branch:

#!/usr/bin/env ruby

pwd = ARGV[0] || '.'
debug = ARGV[1] == '-d'

Dir[File.join(pwd, '**', '.git')].each do |repo|
  repo.gsub!(/\.git$/, '')
  $stderr.puts "in #{repo}" if debug
  if `cd #{repo}; git status`.include?("Your branch is ahead of") 
    puts repo
  end
end