PDF Generation with AWS Lambda and SQS
GM Binder is a tool that allows users to create and manage documents for their favorite Tabletop Role-Playing Games. The core feature in GM Binder is a Markdown editor that allows users to use a very simple syntax to create amazing looking documents. These documents are rendered using basic HTML and CSS in a web browser.
The challenge
Documents in GM Binder are produced using raw HTML and CSS. Users can generate PDF versions of their documents, but the current system relies on their web browser to do so, namely the browser’s print-to-PDF ability. This poses two problems:
- GM Binder does not have control over what browser a user is using. Some browsers and versions do a much better job than others, while others do not translate the HTML/CSS document based on the same W3C standards.
- While most documents are under 10 pages long, some are over 100. Generating PDFs for large documents in the browser takes a long time, leading to poor user experience.
The solution
Generating PDFs consistently and economically is tough. How much of this effort do we offload on to the client? How much server processing do we do? Where are these files stored and how are they hosted for our users? These are just a few of the questions our solution needed to answer.
After exploring a number of options, we settled on an approach that would leverage three AWS services: Lambda functions, SQS (Simple Queue Service), and S3 for PDF storage. By using these services we were able to remove the burden of PDF generation from GM Binder’s EC2 instances and take advantage of the scalability of SQS and Lambda functions.
Breaking down the process
The process kicks off when a user requests the generation of a PDF version of their document. Upon request, GM Binder’s backend determines the number of pages the document contains and sends an SQS message to a Generate PDF queue for multi page fragments of the document. An additional message is sent to a Processing Queue which contains the order of document fragments.
When the Generate PDF queue receives a message, it triggers an instance of the Generate PDF Lambda function. Using information provided by the SQS message, this function launches a headless Chromium browser and, using Puppeteer, makes a request to a url which renders the document’s HTML and CSS. The function then generates a PDF buffer of the requested fragment the SQS message specified and saves that PDF file to S3.
While the Generate PDF process is running, the Processing queue receives messages from GM Binder’s backend, that contain an array of S3 paths that point to the expected document fragments. Once a message is received, a different Lambda function is triggered which checks S3 to see if all PDF fragments have successfully been generated. If they have, the function adds a message to another SQS queue called Merge PDF. If any fragments are still processing, the function exits the process, the SQS message is returned to its queue, where it will be run again shortly. This function checks that all fragments have been created.
The final step, handled similar to the previous, is to merge the PDF fragments together. . When a message arrives in the Merge PDFs queue from the Processing function, a final Lambda function is triggered. This function, PDF Merge, retrieves the fragments from S3 and uses the npm package PDF-Lib to stitch them together. Once the final document is generated, it is saved back in S3 and the fragments are deleted.
The benefits
Offloading PDF generation provides multiple benefits to both the user and GM Binder:
- PDF generation can be CPU-intensive. By offloading this functionality to micro-services, we eliminate any of the load we might see on our core infrastructure.
- The SQS and Lambda combination allows us to generate large documents more quickly. Rather than using one process to generate a 100 page document, we are able to fire off multiple Lambda instances simultaneously, each running Chromium and creating multiple fragments of the document at the same time.
- SQS allows us to handle errors easily through retries and dead letter queues. In the event a process fails, we have configured SQS to retry the process a specified number of times. If all attempts fail, the message is sent to a separate dead letter queue, where our application can handle the error and notify the user.
With over 100,000 documents and millions of users around the world, GM Binder now has a process that consistently generates PDF files for users that no longer depends on the user’s browser. After running multiple test documents through the new microservice-based process, we found that large PDF files were generating 40%-45% faster through the new process when compared to the old client-side process. Additionally, by offloading this process to the cloud, users no longer have to wait to generate files at all. They simply download them!