When working with PDF forms, it can be time-consuming to manually fill out each form individually. Luckily, Python provides several ways to automate this process and fill out multiple forms in a batch. In addition, if you prefer working with the command line, there is also a solution using bash. In this article, we will explore three different options to batch fill PDF forms using Python or bash.
Option 1: Using the PyPDF2 Library
The PyPDF2 library is a popular choice for working with PDF files in Python. It provides a simple and intuitive API for manipulating PDFs, including filling out form fields. To use this library, you will need to install it first by running the following command:
pip install PyPDF2
Once installed, you can use the following code to batch fill PDF forms:
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
def fill_pdf_form(input_pdf, output_pdf, form_data):
input_pdf = PdfFileReader(open(input_pdf, "rb"))
output_pdf = PdfFileWriter()
for i in range(input_pdf.getNumPages()):
page = input_pdf.getPage(i)
if "/AcroForm" in page.extractText():
for field, value in form_data.items():
page.updateFormField(field, value)
output_pdf.addPage(page)
with open(output_pdf, "wb") as output_file:
output_pdf.write(output_file)
# Example usage
form_data = {
"name": "John Doe",
"email": "johndoe@example.com",
"phone": "1234567890"
}
input_folder = "input_forms"
output_folder = "output_forms"
for filename in os.listdir(input_folder):
if filename.endswith(".pdf"):
input_pdf = os.path.join(input_folder, filename)
output_pdf = os.path.join(output_folder, filename)
fill_pdf_form(input_pdf, output_pdf, form_data)
This code uses the PyPDF2 library to iterate over each page of the input PDF, check if it contains a form, and fill out the specified form fields using the provided data. The filled forms are then saved to the output folder.
Option 2: Using the pdfrw Library
If you prefer a lightweight solution, you can use the pdfrw library. It provides a simple interface for reading and writing PDF files, including filling out form fields. To install pdfrw, run the following command:
pip install pdfrw
Here’s an example code snippet that demonstrates how to batch fill PDF forms using pdfrw:
import os
import pdfrw
def fill_pdf_form(input_pdf, output_pdf, form_data):
template_pdf = pdfrw.PdfReader(input_pdf)
for page in template_pdf.pages:
if "/AcroForm" in page:
for field in page['/AcroForm']['/Fields']:
if field['/T'] in form_data:
field.update(pdfrw.PdfDict(V=form_data[field['/T']]))
pdfrw.PdfWriter().write(output_pdf, template_pdf)
# Example usage
form_data = {
"name": "John Doe",
"email": "johndoe@example.com",
"phone": "1234567890"
}
input_folder = "input_forms"
output_folder = "output_forms"
for filename in os.listdir(input_folder):
if filename.endswith(".pdf"):
input_pdf = os.path.join(input_folder, filename)
output_pdf = os.path.join(output_folder, filename)
fill_pdf_form(input_pdf, output_pdf, form_data)
This code uses the pdfrw library to read the input PDF, iterate over each page, and fill out the specified form fields using the provided data. The filled forms are then saved to the output folder.
Option 3: Using pdftk with Bash
If you prefer working with the command line, you can use the pdftk tool to batch fill PDF forms. pdftk is a command-line tool for manipulating PDF files and can be installed on most Linux distributions. To install pdftk, run the following command:
sudo apt-get install pdftk
Once installed, you can use the following bash script to batch fill PDF forms:
#!/bin/bash
input_folder="input_forms"
output_folder="output_forms"
form_data="name=John Doe email=johndoe@example.com phone=1234567890"
for file in $input_folder/*.pdf; do
output_file="$output_folder/$(basename "$file")"
pdftk "$file" fill_form <(echo "$form_data") output "$output_file" flatten
done
This bash script uses a loop to iterate over each PDF file in the input folder, fills out the form fields using the specified data, and saves the filled forms to the output folder. The form data is provided as a string in the format "field1=value1 field2=value2".
After exploring these three options, it is clear that the best choice depends on your specific requirements and preferences. If you prefer working with Python and want more control over the PDF manipulation process, options 1 and 2 using the PyPDF2 or pdfrw libraries are excellent choices. On the other hand, if you prefer a command-line solution or need to integrate the process into a bash script, option 3 using pdftk is a convenient and efficient choice. Ultimately, the best option is the one that aligns with your workflow and meets your needs.
3 Responses
Option 1 seems cool, but why not use a library thats more versatile and user-friendly?
I totally disagree! Option 1 is the bomb! Its simple, efficient, and gets the job done. Who needs a fancy library when you can get the same results with less hassle? Dont overcomplicate things, my friend.
Option 1 seems like a solid choice, but Im curious about the pros and cons of Option 2. Any thoughts?