Andrew Ingram

Common Pitfalls with Django and South

Posted on .

South is a database migration library for Django and it's become an indispensable part of my toolkit. But I've had to learn some best practices the hard way.

For the most part, using South in my projects has been a huge success. I use it for everything from simple blog websites (like this one) with a single developer, to large multi-database e-commerce projects built by bigger teams. Along the way I've made and witnessed a number of mistakes that are easy to make, hopefully these tips I've put together will help you avoid some of them.

Beware of differences between databases

Sometimes you might be using different database engines for local development and production, I can't personally recommend this but it does happen. It's quite easy to forget the differences as you're writing migrations. MySQL has an issue with indexes on big columns and non-transactional schema alterations, sqlite is just annoyingly forgiving about everything. PostgreSQL is strict by default (a very good thing!), so if you're using it in production but not in development you might have some nasty surprises when you do a build.

Manage your dependencies

In each migration you can declare a depends_on field which tells South what other migrations must be run beforehand. When you create a relationship to another app (for example from basket to catalogue), it's important that you tell South to first run the migration which creates the database table at the target end of the relation.

Often you'll find you can get by without doing this, it depends on the way your database handles foreign keys (PostgreSQL is nice and strict about this by default, so you'll see the errors). But the order of your INSTALLED_APPS has an impact on this too, when running migrate your apps will be migrated in the order they're defined in your settings. You may find your dependencies are implicitly taken care of, but my recommendation is to always do it explicitly - you're less likely to have a problem if someone re-orders your apps (for whatever reason) and it's more pythonic to be explicit.

Check your migrations run from scratch

It can be easy to fall into the trap of assuming that just because a migration runs perfectly at the time of authoring (and perhaps for a while longer) it'll always work. In reality this is true only if they've been authored correctly, my other tips will help with this. I have a habit of making sure I can bootstrap the entire project from scratch time-to-time, or sometimes just the database. Your continuous integration and test suite will also help with this, just be aware that the test suite running successfully doesn't always means migrate will work.

Watch out when importing

On occasion you may find yourself importing functionality from elsewhere in your codebase for use in a migration, you'll usually see this in a complex data migration. The reason why this can go wrong is that it's easy to neglect migrations when re-factoring part of your project, a handy function that existed at the time was written may no longer exist a few months later - causing a broken migration. Your test suite and CI builds should prevent this one from causing too many problems, but minimising the use of imports in migrations will stop this happening in the first place.

Always use South's ORM.

This is related to the problem of importing, but I see this one all the time so it's worth highlighting. I regularly see (and have been guilty of authoring) data migrations that import models rather than use South's own ORM and its frozen models. There are two main reasons for this:

  1. The model has methods that you want to use as part of the migration. In my experience this usually happens with libraries like treebeard and django-mptt, which give you a number of methods on models for managing tree structures. I'm sure there are numerous other examples.
  2. The model you want to use isn't in South's frozen ORM.

The simplest workaround to the first problem is to re-implement some of the model's algorithms within your migration, in practice this might not be practical. The other thing you can do is manually edit the frozen ORM data to add an additional base class for when South generates the model class, see this article for details.

For the second problem the solution is nice and simple, when creating the data migration you can use the --freeze argument to include additional apps in the frozen ORM.

./ datamigration foo --freeze bar

Custom save methods

Continuing from the last tip; when South creates the model class from the frozen ORM data, it won't include any custom save methods you may have defined (unless your method is on a parent class which you've added to the bases list). People often add a slugify operation or other processing to their save methods, so if you want to use the data migration to perform this functionality, you'll have to do this explicitly rather than depending on save().

Check for conflicts

When multiple people are working on a project, you'll often get conflicting migrations being pushed into version control. It's important to address these as soon as they happen, otherwise the situation will get worse over time. Quite often there'll be no visible symptoms because the functionality of the migrations doesn't conflict, but you'll often end up with South's ORM getting a little confused about what the right state is.

The South tutorial has some guidelines on how to manage teams and workflow.

ContentTypes and Permissions

If you write a migration that depends on a ContentType or Permission existing, be aware that these won't automatically be created (via post_syncdb hook) until after South has finished running the migrations, which means if you're starting from a fresh database they won't exist for use in a data migration unless you create them explicitly. You can do it like this:

from import update_contenttypes
from import create_permissions
from django.db.models import get_app, get_models

update_contenttypes(get_app('your_app_label'), get_models())
create_permissions(get_app('your_app_label'), get_models(), 0)

You should also be aware that South doesn't freeze the state of the permissions tuple from a model's Meta class, this means that you might have some problems if you change this over time and a data migration depends on certain permissions existing (or not).

I'm not a fan of data being automatically introduced into a system outside of migrations (too many mechanisms for introducing state is risky), and I worry that Django's mechanism for creating Permissions is problematic for this reason. With the exception of the 3 basic permissions that Django always creates for each model, you might want to consider skipping the permissions tuple entirely and creating them in migrations.

Be careful with fixtures

South makes it possible to make significant structural changes to your database quickly, unfortunately this means it's very easy for fixtures to get left behind. This is especially problematic for initial_data fixtures which automatically get loaded whenever you use syncdb or migrate, it quickly becomes a pain to keep them up-to-date and you'll often find that your database is incompatible with the fixture at certain points in the migration history.

You have a couple of options here; one is to keep your fixtures up-to-date, but accept that they'll only work with the latest version of your database schema. The other is to use some trickery to load fixtures as part of a data migration, this lets you avoid the problem of having to keep fixtures updated (the subsequent migrations take care of any necessary transformations). If you wish to use Django's loaddata command in a migration you'll need to use some trickery to get it to be aware of the frozen state of the ORM (thanks to jwpeddle on Hacker News for pointing out this necessity), fortunately I've stumble across a little snippet on Stackoverflow which seems to solve the problem.