In this chapter, we will see what strong migrations are and how they can help in
building a reliable application. We will first learn about dangerous migrations
and the detrimental effects they can have on business operations. Eventually, we
will look at a gem called strong_migrations
that will enable us to write
reliable migrations.
Consequences of Dangerous migrations
At this point, you might already know what migrations are. In a nutshell,
migrations are a feature of Active Record that allows you to evolve your
database schema over time. Developers often tend to overlook how dangerous
migrations can be if they are not planned and executed carefully. There are
several potential risks and challenges that can arise during a migration,
including:
- Data loss: If data is not properly backed up and transferred during a
migration, it can be lost or corrupted.
- Downtime: They can cause disruptions to business operations if they are
not properly managed.
- Compatibility issues: Migrations can sometimes result in compatibility
issues with existing systems and processes.
- Security risks: Migrations can expose an organization to security risks if
they are not properly secured.
The above facts reinforce the need for well-written migrations which can
potentially save the organization a large amount of time and money during
maintenance.
Downtime
Among the risks associated with migrations, let us take a look at the concept of
downtime. Downtime refers to a period of time during which a system, service, or
application is unavailable or not functioning correctly. Downtime can have
significant consequences for businesses, as it can result in lost revenue,
reduced productivity, and decreased customer satisfaction. It is important for
businesses to minimize downtime and ensure that their systems and services are
as reliable and available as possible. So it stands to reason for an
organization to enforce a minimal downtime policy.
Now, let us assume we have a PR that consists of some hefty migrations. Consider
the PR is merged and we are set to run our migrations in the production
environment. While running these time-consuming migrations in the production
environment, the tables associated with each operation can be rendered
irresponsive to read and write queries, hence contributing to the downtime of
the application.
But why are read and write queries blocked in the first place? The answer boils
down to two concepts in Database Management Systems: transactions
and locks
.
Transactions refer to a set of operations that are executed as a single unit,
either all completed successfully or all being rolled back if any of the
operations fail. Transactions use locks to control concurrent access to data.
Locks in a database are used to protect data integrity and consistency by
preventing multiple transactions from accessing and potentially modifying the
same data simultaneously. This is why some operations will be blocked while
running migrations. There are several types of locks that can be used in a
database, including exclusive locks
, shared locks
, update locks
, and
intent locks
.
It should be noted that Rails will run your migrations inside a transaction if
your database supports it. So while running the migrations, tables might be
locked down by the operations within your migrations hence rendering other
operations, using the same table, unprocessable.
This is just one type of dangerous migration. We will see more of these
dangerous migrations in a later section.
Detecting unsafe migrations
We just saw a case where a dangerous migration can prove to be harmful to our
application. As a workaround, we could use the techniques that we will discuss
in the next section. But it can be time-consuming and often require the
developers to self-inspect whether the migration they've written is dangerous or
not.
Wouldn’t it be easier if there was someone to tell you whether a migration is
dangerous or not each time you try to migrate? That is exactly what the
strong_migrations
gem does. Its purpose is to let you know that you’re running
an unsafe migration and instruct you on what you can do to make it better.
To use this gem, you can add the following line to your application’s Gemfile:
Afterward, you can run the below commands to complete the setup.
Now every time you try to run an unsafe migration, you will be warned of the
same. Please note that this does not happen when you try to run raw SQL queries.
Strong Migrations cannot ensure safety for raw SQL statements. In case you have
to execute raw SQL statements in your migrations, then it's better to explicitly
assure its safety, like this:
Dangerous migrations and their corresponding checks
In this section, we will see examples of some unsafe migrations and explore the
reason why they are unsafe. We will also see how we can make them safe.
Adding a column with a default value
Let us see how we normally tend to perform migration for this operation.
The problem with the above code is that when a column is added, an
access exclusive
lock is made on the table preventing it from being used by
other operations. Now while adding a column with a default value, the latter
will have to be propagated across all the rows in the table which can be
time-consuming. This means that during the entire time, read and write queries
on the table will be blocked.
In earlier versions of Postgres, MySQL, and MariaDB, adding a column with a
default value to an existing table caused the entire table to be rewritten.
However, In Postgres 11+, MySQL 8.0.12+, and MariaDB 10.3.2+, this no longer
requires a table rewrite and is safe, except for volatile functions like
gen_random_uuid().
A safe approach to this migration would be to add the column without a default
value and then change the default.
We already saw that Active Record creates a transaction around each migration.
So one should avoid back-filling, like using the update
command, in the same
transaction that alters a table since it keeps the table locked for the duration
of the backfill. If necessary you can use disable_ddl_transaction!
within the
migration to prevent it from being wrapped by a transaction.
Changing the type of the column
This one is similar to adding a column with a default value, i.e, during the
entirety of the operation, the lock on the table prevents read and/or write
operations on that table.
So a code like the one given below would not suffice.
A better way would be to:
- Create a new column
- Write to both columns
- Backfill data from the old column to the new column
- Move reads from the old column to the new column
- Stop writing to the old column
- Drop the old column
Removing a column
The migration for removing a column might be straightforward.
While it seems tempting to use the above code, it in fact is a dangerous
migration. This is because Active Record caches database columns at runtime. So
if a user tries to insert a tuple into that table while the migrations are being
run, it would throw an exception since the column has actually been removed.
This error will persist until the application is rebooted after the migrations’
completion.
A better version for removing a column would be to:
- Tell Active Record to ignore the column from its cache
- Deploy the code
- Write a migration to remove the column (wrap in safety_assured block)
- Deploy and run the migration
- Remove the line added in step 1
A safer approach is to ensure that the column we wish to drop is not in use in
the application.
Renaming a column
Renaming a column is similar to the case of removing a column in the sense that
both face issue of the app not being able to reload during migrations. As a
result, the queries raised during the migration end up using the cached values
which contradict the actual database state, raising exceptions. But the solution
here is not similar to the former case since we cannot just deploy the code that
uses a new column name that is not available yet.
A safer approach is to:
- Create a new column
- Write to both columns
- Backfill data from the old column to the new column
- Move reads from the old column to the new column
- Stop writing to the old column
- Drop the old column
Renaming a table
The conventional code to rename a table might look like this:
Renaming a table is something we must try to avoid entirely. However, if it is
absolutely necessary you might run into the same problems one might face while
renaming a column.
A safer approach is to:
- Create a new table
- Backfill data from the old table to the new table
- Move reads from the old table to the new table
- Stop writing to the old table
- Drop the old table
Adding a check constraint
For adding a check constraint one might most likely resort to the below
migration.
But as you might have guessed, this is a dangerous migration. Let us see why in
the context of the Postgresql database.
For the above code snippet, the operation performed by Postgresql can be divided
into two steps:
- Adding a check constraint and enforcing it for all new
insert/update
statements. This operation requires an access exclusive
lock. No validation
is done in this step.
- Checking all existing rows to verify they conform to the specified
constraint. Validation requires a
row share
lock on the reference table
which will block any operations requiring exclusive locks while validating
the constraint.
As you can see, the validation can make the tables associated with the check
constraint irresponsive to certain queries for a reasonable amount of time.
An obvious better approach would be to add the check constraint without
validating existing rows in the migration.
The validation can then be done in a separate migration.
Note that this approach is recommended only for Postgresql. Let us explore some
more dangerous migrations pertaining to the Postgresql database.
- Adding an index: In Postgres, adding an index non-concurrently blocks
writes. Hence it is always advised to add indexes concurrently.
- Adding a reference: Rails adds an index non-concurrently to references by
default. This means that it will run into the same problems as in the case of
adding an index. So make sure to add the index concurrently while adding a
reference.
- Adding a foreign key: This is similar to the case of adding a check
constraint. It prevents writes when validation is performed as a part of
adding a foreign key. Hence a safer approach would be to add the foreign key
without validating existing rows and then validate them in a separate
migration.
- Setting
NOT NULL
on an existing column: In Postgres, setting NOT NULL
on an existing column blocks reads and writes while every row is checked. A
safer approach is to resort to check constraint to perform this operation.
- Keeping non-unique indexes to three columns or less: Adding a non-unique
index with more than three columns rarely improves performance. Instead, start
an index with columns that narrow down the results the most.
There are more concepts related to strong migrations in the context of the
Postgresql database. You can refer to the official
docs to learn more about
it.
Backfilling data
We already saw that Active Record creates a transaction around each migration.
So one should avoid backfilling, like using the update
command, in the same
transaction that alters a table since it keeps the table locked for the duration
of the backfill. If necessary you can use disable_ddl_transaction!
within the
migration to prevent it from being wrapped by a transaction.
Creating a table with the force option
It is strongly advised not to create tables with the force
option since it can
drop an existing table. If you intend to drop an existing table, run
drop_table
first.
In summary, we saw why strong migrations are important and how you can replace
potentially dangerous migrations with their corresponding checks. You can read
more about the checks that the gem strong_migrations
offers in their official
docs.
References