Batch fill pdf forms from python or bash

When working with PDF forms, it can be time-consuming to manually fill out each form individually. Luckily, Python provides several ways to automate this process and fill out multiple forms in a batch. In addition, if you prefer working with the command line, there is also a solution using bash. In this article, we will explore three different options to batch fill PDF forms using Python or bash.

Option 1: Using the PyPDF2 Library

The PyPDF2 library is a popular choice for working with PDF files in Python. It provides a simple and intuitive API for manipulating PDFs, including filling out form fields. To use this library, you will need to install it first by running the following command:

pip install PyPDF2

Once installed, you can use the following code to batch fill PDF forms:

import os
from PyPDF2 import PdfFileWriter, PdfFileReader

def fill_pdf_form(input_pdf, output_pdf, form_data):
    input_pdf = PdfFileReader(open(input_pdf, "rb"))
    output_pdf = PdfFileWriter()

    for i in range(input_pdf.getNumPages()):
        page = input_pdf.getPage(i)
        if "/AcroForm" in page.extractText():
            for field, value in form_data.items():
                page.updateFormField(field, value)
        output_pdf.addPage(page)

    with open(output_pdf, "wb") as output_file:
        output_pdf.write(output_file)

# Example usage
form_data = {
    "name": "John Doe",
    "email": "johndoe@example.com",
    "phone": "1234567890"
}

input_folder = "input_forms"
output_folder = "output_forms"

for filename in os.listdir(input_folder):
    if filename.endswith(".pdf"):
        input_pdf = os.path.join(input_folder, filename)
        output_pdf = os.path.join(output_folder, filename)
        fill_pdf_form(input_pdf, output_pdf, form_data)

This code uses the PyPDF2 library to iterate over each page of the input PDF, check if it contains a form, and fill out the specified form fields using the provided data. The filled forms are then saved to the output folder.

Option 2: Using the pdfrw Library

If you prefer a lightweight solution, you can use the pdfrw library. It provides a simple interface for reading and writing PDF files, including filling out form fields. To install pdfrw, run the following command:

pip install pdfrw

Here’s an example code snippet that demonstrates how to batch fill PDF forms using pdfrw:

import os
import pdfrw

def fill_pdf_form(input_pdf, output_pdf, form_data):
    template_pdf = pdfrw.PdfReader(input_pdf)
    for page in template_pdf.pages:
        if "/AcroForm" in page:
            for field in page['/AcroForm']['/Fields']:
                if field['/T'] in form_data:
                    field.update(pdfrw.PdfDict(V=form_data[field['/T']]))

    pdfrw.PdfWriter().write(output_pdf, template_pdf)

# Example usage
form_data = {
    "name": "John Doe",
    "email": "johndoe@example.com",
    "phone": "1234567890"
}

input_folder = "input_forms"
output_folder = "output_forms"

for filename in os.listdir(input_folder):
    if filename.endswith(".pdf"):
        input_pdf = os.path.join(input_folder, filename)
        output_pdf = os.path.join(output_folder, filename)
        fill_pdf_form(input_pdf, output_pdf, form_data)

This code uses the pdfrw library to read the input PDF, iterate over each page, and fill out the specified form fields using the provided data. The filled forms are then saved to the output folder.

Option 3: Using pdftk with Bash

If you prefer working with the command line, you can use the pdftk tool to batch fill PDF forms. pdftk is a command-line tool for manipulating PDF files and can be installed on most Linux distributions. To install pdftk, run the following command:

sudo apt-get install pdftk

Once installed, you can use the following bash script to batch fill PDF forms:

#!/bin/bash

input_folder="input_forms"
output_folder="output_forms"
form_data="name=John Doe email=johndoe@example.com phone=1234567890"

for file in $input_folder/*.pdf; do
    output_file="$output_folder/$(basename "$file")"
    pdftk "$file" fill_form <(echo "$form_data") output "$output_file" flatten
done

This bash script uses a loop to iterate over each PDF file in the input folder, fills out the form fields using the specified data, and saves the filled forms to the output folder. The form data is provided as a string in the format "field1=value1 field2=value2".

After exploring these three options, it is clear that the best choice depends on your specific requirements and preferences. If you prefer working with Python and want more control over the PDF manipulation process, options 1 and 2 using the PyPDF2 or pdfrw libraries are excellent choices. On the other hand, if you prefer a command-line solution or need to integrate the process into a bash script, option 3 using pdftk is a convenient and efficient choice. Ultimately, the best option is the one that aligns with your workflow and meets your needs.

Rate this post

3 Responses

    1. I totally disagree! Option 1 is the bomb! Its simple, efficient, and gets the job done. Who needs a fancy library when you can get the same results with less hassle? Dont overcomplicate things, my friend.

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents