92.Generating a PDF report

The point of listing down the Features and Technical Design is to provide you an opportunity to think about how you would implement this feature based on the technical design required. So make sure to research a bit and think about the different pieces required in completing the feature. If you are stuck or unsure, then please make sure to refer this chapter in whole to understand how to build this feature.

Features

Let us introduce a new feature to let users download a report for the tasks in PDF format.

These are the requirements of the feature.

A download button should be present in the NavBar.
User should only be required to click on that button and the PDF file should be automatically saved to their system without them having to do anything else.
The PDF report should contain the list of tasks that the currently logged in user has either created or assigned to, with their status, that is pending or completed shown using checkboxes.
The filename should be something sensible like granite_task_report.

Technical design

To implement this feature, we need to introduce the following changes:

On the backend

Use wicked_pdf gem to generate the PDF file and create the necessary views and layouts for this gem to work.
Create a resource called report, namespaced within the module tasks, such that we can namespace the route like tasks/report.
Have two separate actions, one for generation of PDF using a Sidekiq job, and then one for sending the file as blob via JSON response.

On the frontend

A NavBar item to download to the report.
A DownloadReport component namespaced within Tasks component to show the loaders during report downloading.
We should also implement the logic to save a file from the PDF blob that we receive in the API response.

We are now ready to start coding. Let us dive in.

Add the wicked_pdf gem

Wicked PDF uses the shell utility wkhtmltopdf to serve a PDF file to a user from HTML. In other words, rather than dealing with a PDF generation DSL of some sort, we write an HTML view as we would normally, then let Wicked PDF take care of the hard stuff.

Add the following lines to the end of your Gemfile:

Once gems are added, install them by running:

Now generate the initializer:

This will generate the configuration file config/initializers/wicked_pdf.rb which can be used to provide options to wicked_pdf on an application level.

Add the PDF layout

The wicked_pdf gem makes use of specific PDF layout to embed all the application views or content. So first let's define this layout.

Create the layout by running:

Now add the following as it's to the layout:

See the yield section in the above code? That's where the content or each of our application view will be embedded into.

Add the report routes

It's a common practice to start building the routes first before touching any other application logic.

So as we had discussed in the technical design, we need to handle the report of the tasks within the tasks namespace. And we have already addressed that the response will be JSON format itself. So here we can add nested resource under tasks. like this:

But this has few issues:

It generates a lot of unnecessary actions, like update, destroy, etc, that we don't need.
The controller for the report should be handled within the root of the controllers folder. But we want to keep the namespace tasks/report, meaning embed report controller within a tasks folder.
The report is currently scoped to a task_slug. But that's unnecessary. We need download the report containing all the tasks of the currently logged in user.

Before moving to the section fixing issues with the route, let's first see how we can debug out what all routes will be generated by Rails.

Viewing Rails routes

Rails has a routes command that can show us all the routes that Rails will use based on the routes.rb file. Since the output is often a bit large, it's a good idea to pipe it to the less command in Unix so that we can scroll through the output.

So if we had run the above command with the routes that we had added in the last section, then we'd be seeing routes like this:

Prefix Verb	URI Pattern	Controller#Action
new_task_report	GET /tasks/:task_slug/report/new(.:format)	reports#new
edit_task_report	GET /tasks/:task_slug/report/edit(.:format)	reports#edit
task_report	GET /tasks/:task_slug/report(.:format)	reports#show
PATCH	/tasks/:task_slug/report(.:format)	reports#update
PUT	/tasks/:task_slug/report(.:format)	reports#update
DELETE	/tasks/:task_slug/report(.:format)	reports#destroy
POST	/tasks/:task_slug/report(.:format)	reports#create

The above table looks well formatted and easy to read. That may not be the case if you are trying this from your terminal. So make sure to enter into full screen and reduce the font-size of your terminal before running the above command to view route without wrapping to newline.

Nested routes with namespacing

Let's first point out what all actions we need for the report. We need:

create: This is for initiating the process of generating a report.
download: This is for downloading the generated report as a blob.

Now we have a mental map of the routes that we require and where it should be handled, that is:

So few things we can devise from the above routes is that:

report is a singular resource. Well we don't need an index action for report. So we should keep it as resource and NOT resources.
report is a collection under tasks. If it was a regular nested route, then it would become a member. But we don't want that.
download is a collection under report.
report is scoped under tasks module.

Thus let's update the routes to take into account above requirements, like this:

Some new things that you have to notice:

We have kept the report resource under collection block rather than passing on: :collection for that resource. That's because resource or resources generate multiple routes and all of them have to be made a collection.
The module: :tasks is for handling the reports_controller within a tasks folder. If we omit the module: :tasks, then Rails expect the reports_controller to be within the root of the controllers folder. But we don't want that.

Rest of the routes and magic, at this stage of the book, you should be able to comprehend on your own.

Generating nested controller

So we need to do the following:

Create the tasks folder.
Under that tasks folder we need to create the reports_controller and fill in the default controller code.
In that controller we have to add the necessary actions.

Phew! That's a lot of manual work. Let's the take smart way and automate it, by running the following:

That should generate the following template code for us in tasks/reports_controller:

PDF generation job

The amount of time required to generate the PDF purely depends on the number of tasks the user has and what all calculations we will be performing. Safe to say we can't let this logic hog up our request-response cycle. The main aim of a controller should be to respond as quickly as possible back to the client. Thus let's create a Sidekiq job to take care of PDF report generation logic.

We will first add the necessary logic to our codebase and then walk through what we have added.

Create the job file:

Now add the following content into that file:

In the above code, you might have noticed that the hash fed to assigns missing a value. This is an instance of value omission in Ruby hash objects.

Create a task model scope

Let's update our Task model and add the following model scope into it:

Now check whether you had added the above mentioned statement to some random line within the Task model. If yes, then first go through the macro's section of the Rubocop Rails style guide and move the scope to its appropriate line. Ideally it should be right after we define our model constants.

The scope should be self-explanatory by its name itself. We want to get the tasks created by or assigned to a user, since that's the data we will be showing in our report.

Create PDF content view

Let's create the view that will be used by our PDF generator:

Add the following lines to the view:

As you can see, this view expects presence of instance variable @tasks.

Rendering views outside controller

ActionController::Renderer allows us to render arbitrary templates without requirement of being in controller actions. You get a concrete renderer class by invoking ActionController::Base#renderer. For example:

It allows you to call the render method directly, like this:

You can use this shortcut in a controller, instead of the previous example:

The render method allows us to use the same options that we can use when rendering in a controller.

If you'd like to dig more deeper, then refer the official docs.

So in our case, we've specified three things:

The instance variable tasks via assigns.
The template to be rendered.
The layout(layout/pdf.html.erb) into which this view should be rendered into.

This will create the report content in string format. After that we created the PDF blob using wicked_pdf gem and save in binary format into a file, where the report path is passed from wherever the job is invoked.

That pretty much wraps up the job.

Report controller

Now it's time to make use of this job and generate the pdf report. Let's add the necessary content for our controller first and then talk about each section. Add the following to app/controllers/tasks/reports_controller.rb:

The logic is pretty straightforward:

From the front-end side when we click on Download report, we will invoke the create action and start generating the report in background.
We store the report temporarily in the tmp folder within our project. This has some flaws, which we will discuss about later.
From the front-end side we will poll after a delay of say 5 seconds to the download action.
The download action checks if the file has been generated and sends the file as a blob as attachment back to the client in the JSON response.

That pretty much wraps up our backend side.

Frontend logic for downloading a file

Let's have a mental map of what all we need in the front-end side:

A Download report button in the NavBar which the user can click to generate the report.
On clicking that button we should redirect them to a page where we can show that the report is being generated etc.
API connectors to hook into the backend APIs.

Add the following route to App.jsx:

Now, we need to add a Download Report option to the dropdown menu in Navbar. Thus add the following lines to before the Log out link in app/javascript/src/components/NavBar/index.jsx:

Create the DownloadReport component:

Add the content for the DownloadReport component:

We also need to add the API connectors. Add the following to apis/tasks:

Notice how we have passed in responseType as blob to the axios request? That part is important given that we are sending the PDF report as blob from backend side.

Now let's talk about the logic used within the DownloadReport component.

Whenever we visit this page, we initiate the report generation in the backend via the useEffect. That's what generatePdf does.
Then after a timeout of 5 seconds, we try and download the report from backend.
The downloading part involves fetching the PDF blob from backend and then saving it to the client's system as a PDF file.

Before we talk about the saveAs method, let's understand certain concepts in the upcoming sections.

The Content-Disposition header

To inform the client that the content of the resource is not meant to be displayed, the server must include an additional header in the response. The Content-Disposition header is the right header for specifying this kind of information.

The Content-Disposition header was originally intended for mail user-agents — since emails are multipart documents that may contain several file attachments. However, it can be interpreted by several HTTP clients including web browsers. This header provides information on the disposition type and disposition parameters.

The disposition type is usually one of the following:

inline : The body part is intended to be displayed automatically when the message content is displayed
attachment : The body part is separate from the main content of the message and should not be displayed automatically except when prompted by the user The disposition parameters are additional parameters that specify information about the body part or file such as filename, creation date, modification date, read date, size, etc.

Most HTTP clients will prompt the user to download the resource content when they receive a response from a server having an attachment disposition.

What are Blobs?

Blobs are objects that are used to represent raw immutable data. Blob objects store information about the type and size of data they contain, making them very useful for storing and working file contents on the browser. In fact, the File object is a special extension of the Blob interface.

Object URLs

The URL interface allows for creating special kinds of URLs called object URLs, which are used for representing blob objects or files in a very concise format. Here is what a typical object URL looks like:

Creating and releasing object URLs

The URL.createObjectURL() static method makes it possible to create an object URL that represents a blob object or file. It takes a blob object as its argument and returns a DOMString which is the URL representing the passed blob object. Here is what it looks like:

It is important to note that, this method will always return a new object URL each time it is called, even if it is called with the same blob object.

Whenever an object URL is created, it stays around for the lifetime of the document on which it was created. Usually, the browser will release all object URLs when the document is being unloaded. However, it is important that we release object URLs whenever they are no longer needed to improve performance and minimize memory usage.

The URL.revokeObjectURL() static method can be used to release an object URL. It takes the object URL to be released as its argument.

The saveAs method

The saveAs method defined in the DownloadReport component boils down to the following steps:

Get the blob data.
Create the object URL.
Create an anchor tag whose href is our object URL.
Set the anchor tag's download attribute with the file name we want.
Attach the anchor tag the DOM.
Simulate clicking on this anchor tag.
Given that the download attribute is set, browser will download it as a file.
Remove the anchor tag from DOM.
Release the object URL after a delay.

Limitations of this logic

This logic would work just fine in our development. But this won't work in production instances if we've hosted it in some platform like Heroku. To understand why, we need to first understand that the Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted. Each dyno boots with a clean copy of the filesystem from the most recent deploy. This is similar to how many container based systems, such as Docker, operate.

In addition, under normal operations dynos will restart every day in a process known as "Cycling".

These two facts mean that the filesystem on Heroku is not suitable for persistent storage of data. In cases where we need to store data, we should be using a database addon such as Postgres (for data) or a dedicated file storage service such as AWS S3 (for static files).

Let's say we are going to take the risk and push this code to production under the assumption that until Heroku cleans up the system we will be able to generate the reports. But that won't work either because each dyno in Heroku has its own file system. Thus the file generated in our worker dyno won't be accessible by the web dyno.

Better way to handles files

Well, like always, Rails also has a solution for this. There's a module called Active Storage in Rails that can save us a lot of pain and handle the file uploads.

Active Storage facilitates uploading files to a cloud storage service like Amazon S3, Google Cloud Storage, or Microsoft Azure Storage and attaching those files to Active Record objects. It comes with a local disk-based service for development and testing and supports mirroring files to subordinate services for backups and migrations.

Using Active Storage, an application can transform image uploads or generate image representations of non-image uploads like PDFs and videos, and extract metadata from arbitrary files.

We won't be deep diving into Active Storage yet. But feel free to look into official Active Storage docs to get a feel of it.

Things you can try out on your own

Write tests verifying this logic - give this a shot. Try to apply what you've learnt till now.
Having dynamic report names - say in the format {current_user}_{today}_report.pdf.
Using Active Storage with some platform like S3 or Google Cloud to handle this logic in production env.

Previous Next