What is Rake and why do we need it?
In the previous chapter, we manually created some demo users for reviewing and
testing the changes we have made. What if we need to share the code with someone
else and they need to review it too? For example, a peer trying to review your
application.
We need some way to automate setting up the project with all required
configurations and necessary data for testing.
Can't we use migration scripts for creating necessary data? No! We can't.
Why? Because the test data shouldn't be created on our production server when we
deploy the application.
When we deploy applications, all migrations are automatically run as part of our
pipeline.
In the case of our previous chapter, these test users are purely for setting up
the staging environment.
We can solve this problem using Rake tasks.
Rake stands for Ruby Make. It's a standalone Ruby utility that "replaces the
Unix utility 'make', and uses a Rakefile
and .rake
files to build up a list
of tasks".
Basically, it is a task runner for Ruby. Rails uses Rake Proxy to delegate some
of its tasks to Rake.
We have used rails db:migrate
in the previous chapters. When
rails db:migrate
is run, what happens internally is that Rails checks if
db:migrate
is supported natively. In this case db:migrate
is not natively
supported by Rails, so Rails delegates the execution to Rake via Rake Proxy.
Setting up Rake tasks
We can also write our custom Rake tasks in Rails environment by creating files
with .rake
extension in ./lib/tasks
.
Often when creating a new project, we need to setup some defaults, like say
populating the user database with default users etc. For such cases we can write
those tasks in ./lib/tasks/setup.rake
. Let's add the code below in our
setup.rake
:
It's a common practice after cloning a repository for the first time, to run
./bin/setup
, to automatically fetch all the libraries, create db, seed data
etc. Therefore it makes sense to invoke our setup.rake
from ./bin/setup
since it also plays a role in bootstrapping the project.
Remove the following lines, including the comments, from the ./bin/setup
under
APP_ROOT
block:
Then add the highlighted lines to ./bin/setup
file:
Note that, if you run ./bin/setup
, then most probably all your DB data will be
wiped off and new seed data will be added.
It's recommended to run this setup, only for the first time that we clone a
repo.
There are valid cases when we need to rerun this setup.
Example:
Let's say as a team we decided we need to modify a migration file. We shouldn't
modify a migration in the first place. But let's say it happened.
Then one of the easiest ways to rerun the updated migration is by running this
setup. Or you can rollback the migrations and manually commit it once again.
Executing the Rake task
Run this command to execute our Rake task:
You will get an error saying something like:
This is because the Rake has no way of knowing the models and classes we have
defined in our Rails environment. It isn't able to find a reference to our
User
without loading Rails environment.
To fix this problem, we need to add our Rails environment into the Rake task.
Update setup.rake
with the below code:
Now, we can run the command again:
You can see our previous error has disappeared. You might see another error that
might look like this:
This is perfectly fine and our Rake task is running as expected. We see this
error because we already have some demo users in our table with the same
email
. Adding a new User
with the same email
would violate the unique
constraint we have enforced on our database.
If you get the email validation error, then it means that you can safely move
forward to the next section.
If you want to see the validation pass, then change the emails of the sample
users in the Rake setup file and run the command once again.
Finalizing the Rake setup
Let us add another task, that destructively recreates the database. This will
avoid problems like the one we have encountered in the previous section, which
is caused by already existing data.
One more key thing to take care is to ensure that we don't drop the database in
production environment. We will be uploading this file to production and staging
environment.
But we will be running the rake task only in the staging environment. The reason
is that staging
environment is where we test out stuff and thus it acts as a
simulated production environment. While testing, it's useful for us to have some
defaults like the default user logins.
Update the setup.rake
file with the following lines of code:
Running this Rake task will drop our database and recreate it from scratch with
the demo data.
If you need to preserve your old data, backup the database file
db/development.sqlite3
and db/test.sqlite3
. But if you're using PostgreSQL,
then this trick won't work. You'd have to dump the DB manually and reuse it.
Run this command to execute our Rake setup:
The above command should output something similar in the console:
The Rails server should be restarted so that the latest data will be loaded.
Importance of ./bin/setup file
The ./bin/setup
file helps in setting up the project for development env in
one go. When we are setting up a new project in the machine then a lot of steps
like creating a database, performing migrations, seeding the database,
installing dependencies, etc. need to be taken care of. We define all these
processes in the bin/setup
file and now we just need to execute the
bin/setup
file to setup the project's base.
The bin/setup
file performs the following operations:
-
Install all the dependencies
-
Create the database.yml
file, if not already exists
-
Execute the rake setup
task which creates the db, migrates all the
migrations, and populates the sample data.
-
Clear the logs and the temp files
-
Restart the Rails server
We can add more operations in the setup file if needed to setup the project.
At BigBinary, the base of all projects should get setup if we run ./bin/setup
file. The developer should not be forced to manually perform migrations or add
seed data while setting up the development env. All they have to do is run
./bin/setup
file.
Dealing with stale Active Record cache
Consider a scenario where a new column or a table is created using a migration
and right after the new column or table is populated with some seed data. When
the migration runs to add the seed data, Rails will throw an exception that the
new column or the table doesn't exist. This is because Rails will use the schema
from the Active Record cache which does not get updated before all the
migrations finish running.
In the above example, Rails will throw an exception when new records are created
in the JobLevels
table because when Rails tries to create the records, it will
use the stale Active Record data which doesn't contain the JobLevel
table.
This issue can also occur when migrations are run during a chained task. For
example, when running something like ./bin/setup
or rails db:migrate db:seed
the schema is cached and sometimes the newly added columns or tables are not
reflected in the cached schema.
According to the
official Rails guide,
we should invoke reset_column_information
method to reset all the cached
information about columns, which will cause them to be reloaded on the next
request.
You can invoke the reset_column_information
method in the up
method of the
migration where a new column or a new table is created and simultaneously data
is being added to that column. This will ensure that the latest schema is loaded
before the request to add data to the new column is processed and any errors due
to stale Active Record cache can be avoided.
The example above can be fixed like this:
Note that the CreateJobLevels
migration is only illustrated to explain the use
case of resets_column_information
method. Please do not add this migration to
the Granite application.
The schema cache can also be cleared by running the
rails db:schema:cache:clear
or rails db:schema:cache:dump
command from the
console. But in a production environment it would not be feasible to do so
because you cannot pause the migrations to update the cache before the next
request is processed. Hence using the reset_column_information
method is the
right way to fix this issue.
We have successfully setup the Rake tasks for this project.
At BigBinary, we write code in a self-explanatory way. This means the code is
more or less like reading an English sentence. Thus it would be self-explanatory
if the sentence makes sense. That's why we don't use comments for explaining the
code. More than that comments don't scale. If we add a detailed comment pointing
out how a utility function works and how it can impact other functions based on
top of it, then once there is a change to this utility function, we would have
to take the effort to change all the comments which utilize this function. Thus
it doesn't make sense to add comments to explain the working of the code, unless
and until there's no other way to convey the same.
But there are cases where comments make sense. Example: We can add some comments
for the work that needs to be worked upon in the future. Such comments should be
prefixed with the TODO:
keyword. This helps in easily jumping into the
comments that need to be fixed.
For example:
There are many ways for navigating through TODO:
comments like the
rake notes
task as mentioned in this
thread. This
command will return the list of all TODO
comments from the Ruby files. We can
easily navigate to each of the TODO
comments from our text editors too. In
VSCode we can install the
Todo Tree
extension to jump through the TODO
comments.
Let's commit the changes: