Python|reportlab实现pdf内容填充

近期因工作需要要进行某个承诺书的签署文件生成,需要在固定的PDF模板上填充上部门、姓名、日期信息。之前使用过福昕PDF编辑器,体验较为不错,但这次要生成的PDF数量有300份,手工逐个处理自然是不便的。在《Python|PyPDF2实现PDF自动拆分》中曾写过利用 PyPDF2 实现PDF文件的拆分,这次我们用python的reportlab库尝试下PDF的生成。

PDF is the global standard for electronic documents. It supports high-quality printing yet is totally portable across platforms, thanks to the freely available Acrobat Reader. Any application which previously generated hard copy reports or driving a printer can benefit from making PDF documents instead; these can be archived, emailed, placed on the web, or printed out the old-fashioned way. However, the PDF file format is a complex indexed binary format which is impossible to type directly. The PDF format specification is more than 600 pages long and PDF files must provide precise byte offsets – a single extra character placed anywhere in a valid PDF document can render it invalid. This makes it harder to generate than HTML.

PDF文件格式是一种复杂的索引二进制格式,不可能直接输入。 reportlab 的处理逻辑是生成一个画布,通过笛卡尔坐标系在位图上创建具有矢量格式的文件内容,之后根据加工后的页码创建一个PDF。

思路

准备好待签署的人员清单及PDF模板,分析PDF模板(待签署页码,签署区域),生成签署内容并合成签署页,合成完成PDF并输出。

源码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
from PyPDF2 import PdfWriter, PdfReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase import pdfmetrics

pdfmetrics.registerFont(TTFont('仿宋', r'C:\Windows\Fonts\simfang.ttf'))

def PdfProcess(pdfInPath,pdfInName,alterPageNumber,dept,username,yearv,monthv,daysv):
# 创建PDF流
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.setFillColorRGB(0, 0, 0) # 字体黑色
can.setFont("仿宋", 16) # 字体样式
can.drawString(328, 307, dept) # 填充部门
can.drawString(343, 249, username) # 姓名
can.drawString(325, 192, yearv) # 年
can.drawString(380, 192, monthv) # 月
can.drawString(420, 192, daysv) # 日
can.save()

# 合成PDF页面
packet.seek(0)
new_pdf = PdfReader(packet)
existing_pdf = PdfReader(open(r'{}\{}'.format(pdfInPath,pdfInName), 'rb'))
output = PdfWriter()
page = existing_pdf.pages[alterPageNumber-1]
page.merge_page(new_pdf.pages[0])

# 合成最终PDF
existing_numPages = len(existing_pdf.pages)
for i in range(0,existing_numPages):
if i == alterPageNumber-1:
output.add_page(page)
else:
output.add_page(existing_pdf.pages[i])

# 输出最终pdf
outputStream = open(r'C:\Users\admin\Desktop\签署文件生成\{}-{}.pdf'.format(dept,username), "wb")
output.write(outputStream)
outputStream.close()

if __name__ == '__main__':
pdfInPath = r'C:\Users\admin\Desktop\签署文件生成'
pdfInName = '模板.pdf'
alterPageNumber = 5
s = open(r'C:\Users\admin\Desktop\签署文件生成\签署清单.txt',encoding='utf-8')
for line in s:
line = line.split()
dept = line[0]
username = line[1]
yearv = line[2]
monthv = line[3]
daysv = line[4]
PdfProcess(pdfInPath,pdfInName,alterPageNumber,dept,username,yearv,monthv,daysv)
print('已完成{}-{}.pdf的生成'.format(dept,username))

总结

reportlab 的体验感还是不错的,这种通过笛卡尔坐标合成的PDF可以很准确的使内容生成到自己想要的位置。后来大致翻阅了官方文库(reportlab pdf库)发现这个库在PDF合成上真的是神器,在文本(颜色、样式)、图像、段落、表格等方面支持性都比较高。


商业转载请联系作者获得授权,非商业转载请注明出处。

支付宝打赏 微信打赏

如果文章对你有帮助,欢迎点击上方按钮打赏作者

Python|reportlab实现pdf内容填充
http://hncd1024.github.io/2023/04/27/Python_reportlab/
作者
CHEN DI
发布于
2023-04-27
许可协议