Python|实现公司周报的自动生成

  如果每周要给领导发一个公司周报,你会怎么做?考虑到美观性和一致性,我想应该是制定一个格式规范,设计好页眉、页脚、页边距、字体、段落、表格框等各类的样式,之后按照规范要求定期撰写公司周报。这是一个很不错的主意,只需要汇总各部门周报内容后,复制粘贴到模板文件里,就可以生成一篇美观的周报了。但如果要自动生成这样一个复杂的word呢?
  本次的需求是实现公司周报(word)的自动生成。在需求分析时发现实现起来还是有些难度的。记录如下:

现状描述

  公司周报是公司各部门/总部每周邮件发送给指定的专员,专员分别下载附件后按照部门/总部的顺序合并到一个word文档中。因各部门/总部报送的文档格式不尽相同,专员在合并的同时还需要调整word使其满足周报规范。

分析过程

  • 周报规范分析(周报通过分节符分为了三节)
    1. 第一节:
      页眉:无;
      页脚:固定文字(彩虹粗仿宋10.5磅字体);
      页面方向:纵向;
      页边距:上2.54厘米,下2.54厘米,左3.17厘米,右3厘米;
      内容:公司logo;封面图片;经营情况周报大标题(彩虹小标宋二号加粗);期号和日期(彩虹粗仿宋四号)
    2. 第二节:
      页眉:含期号和日期的公司经营情况周报文字,公司logo;
      页脚:页码;
      页面方向:纵向;
      页边距:上2.54厘米,下2.54厘米,左3.17厘米,右3厘米;
      内容:一级标题(彩虹黑体四号,首行缩进2字符,段前1行,中文数字编号规则),二级标题(彩虹楷体四号加粗,首行缩进2字符,中文数字+括号编号规则,正文(彩虹粗仿宋四号,首行缩进2字符)
    3. 第三节:
      页眉:含期号和日期的公司经营情况周报文字,公司logo;
      页脚:页码;
      页面方向:横向;
      页边距:上3.17厘米,下3厘米,左2.54厘米,右2.54厘米;
      内容:表格内容(彩虹粗仿宋11磅)
  • 总部周报内容
    各总部的内容需要汇总至第二节,一般情况下涉及一级标题和正文,部分情况下会涉及二级标题或表格。
  • 职能部门内容
    各职能部门的内容需要汇总至第三节,职能部门的填报内容简洁明了,涉及部门、工作事项名称、本周进展情况、下周工作安排四个固定的填写内容。
  • 其他内容
    在分析时发现各总部提交的内容格式不完全统一,需要周报专员手工调整格式,且总部周报在汇总时有排序要求。
    职能部门填报内容虽然简洁,但也会存在填报格式不统一的情况,如序号规则不统一,结束标点符号不统一等。职能部门汇总时也有排序要求,且个别部门的名称会有要求(如一个部门有两块牌子)。

实现过程

  1. 考虑到部门架构或显示顺序可能会调整、因法定假日导致的周报期号不具备严格的周期规律、专员定期更换等原因,创建三个数据库表,用于维护公司周报期号、各总部/部门合成顺序(及部门显示名称)、生成周报的提醒人员邮箱。
  2. 创建一个word的module文件,提前设置好一级、二级、正文的编号/段落/字体等样式。并进行文档编辑区域限制。控制填报人只能按照预设的格式进行编辑。
  3. 定期触发周报填报流程。各总部上传按照module文件整理的周报文件,各职能部门在流程中增加明细行填写部门周报。
  4. 通过python读取/下载上述涉及到的表内容/附件文件,并按照周报格式要求生成word,生成后邮件通知给周报专员。

word合成源码

  本次需求是基于python的docx库实现的。因周报文件涉及的细节比较多,在测试过程中基本是做到了对docx库的深度使用。其中大量阅读并参考了《docx官方文档》,在个别类的复写过程中也参考了CSDN的一些博主的文章。因为对docx文件底层(xml)的理解不是很全面,部分需求也借助了ChatGPT的支持。有效地提升了本次需求的开发效率。源码提供如下:

extract_content_from_docx(递归读取word内容)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
'''
下面是通过递归方式读取段落或表格
其中段落处理时读取word文档的样式、字体,用于区分后续处理。
'''
import docx
from docx.enum.style import WD_STYLE_TYPE

def extract_content_from_docx(file_path):
doc = docx.Document(file_path)
content = [] # 存储按顺序获取的内容
tableNum = 0 # 要处理的第X个表格

for element in doc.element.body:
if isinstance(element, docx.oxml.text.paragraph.CT_P):
paragraph = docx.text.paragraph.Paragraph(element, doc)
# 处理段落内容
style_name = None
font_name = None
if paragraph.style.name.startswith('Normal'):
style_name = 'Normal'
font_name = paragraph.style.font.name
else:
for style in doc.styles:
if style.type == WD_STYLE_TYPE.PARAGRAPH and \
paragraph.style.base_style == style.base_style and \
paragraph.style.font == style.font and \
paragraph.style.paragraph_format == style.paragraph_format:
style_name = style.name
font_name = style.font.name
break

if font_name == '彩虹小标宋':
pass
elif font_name == '彩虹黑体' or style_name == 'Heading 1':
content.append(('paragraph', 'L1',paragraph.text))
elif font_name == '彩虹楷体' or style_name == 'Heading 2':
content.append(('paragraph', 'L2',paragraph.text))
elif font_name == '彩虹粗仿宋' or style_name == 'Normal':
content.append(('paragraph', 'L0',paragraph.text))
# edit by chendi 20230721 增加例外情况下的段落内容
else:
content.append(('paragraph', 'L0',paragraph.text))

elif isinstance(element, docx.oxml.table.CT_Tbl):

table_content = []
table = doc.tables[tableNum]
tableNum += 1
for row in table.rows:
row_content = []
for cell in row.cells:
cell_text = cell.text.strip()
if cell_text:
row_content.append(cell_text)
if row_content:
table_content.append(row_content)
if table_content:
content.append(('table','null', table_content))
return content


add_float_picture(添加浮于文字上方的图片)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# -*- coding: utf-8 -*-

# filename: add_float_picture.py

'''
Implement floating image based on python-docx.
- Text wrapping style: BEHIND TEXT <wp:anchor behindDoc="1">
- Picture position: top-left corner of PAGE `<wp:positionH relativeFrom="page">`.
Create a docx sample (Layout | Positions | More Layout Options) and explore the
source xml (Open as a zip | word | document.xml) to implement other text wrapping
styles and position modes per `CT_Anchor._anchor_xml()`.
'''

from docx.oxml import parse_xml, register_element_cls
from docx.oxml.ns import nsdecls
from docx.oxml.shape import CT_Picture
from docx.oxml.xmlchemy import BaseOxmlElement, OneAndOnlyOne


# refer to docx.oxml.shape.CT_Inline
class CT_Anchor(BaseOxmlElement):
"""
``<w:anchor>`` element, container for a floating image.
"""
extent = OneAndOnlyOne('wp:extent')
docPr = OneAndOnlyOne('wp:docPr')
graphic = OneAndOnlyOne('a:graphic')

@classmethod
def new(cls, cx, cy, shape_id, pic, pos_x, pos_y):
"""
Return a new ``<wp:anchor>`` element populated with the values passed
as parameters.
"""
anchor = parse_xml(cls._anchor_xml(pos_x, pos_y))
anchor.extent.cx = cx
anchor.extent.cy = cy
anchor.docPr.id = shape_id
anchor.docPr.name = 'Picture %d' % shape_id
anchor.graphic.graphicData.uri = (
'http://schemas.openxmlformats.org/drawingml/2006/picture'
)
anchor.graphic.graphicData._insert_pic(pic)
return anchor

@classmethod
def new_pic_anchor(cls, shape_id, rId, filename, cx, cy, pos_x, pos_y):
"""
Return a new `wp:anchor` element containing the `pic:pic` element
specified by the argument values.
"""
pic_id = 0 # Word doesn't seem to use this, but does not omit it
pic = CT_Picture.new(pic_id, filename, rId, cx, cy)
anchor = cls.new(cx, cy, shape_id, pic, pos_x, pos_y)
anchor.graphic.graphicData._insert_pic(pic)
return anchor

@classmethod
def _anchor_xml(cls, pos_x, pos_y):
return (
'<wp:anchor distT="0" distB="0" distL="0" distR="0" simplePos="0" relativeHeight="0" \n'
# ' behindDoc="1" locked="0" layoutInCell="1" allowOverlap="1" \n'
' behindDoc="0" locked="0" layoutInCell="1" allowOverlap="1" \n'
' %s>\n'
' <wp:simplePos x="0" y="0"/>\n'
' <wp:positionH relativeFrom="page">\n'
' <wp:posOffset>%d</wp:posOffset>\n'
' </wp:positionH>\n'
' <wp:positionV relativeFrom="page">\n'
' <wp:posOffset>%d</wp:posOffset>\n'
' </wp:positionV>\n'
' <wp:extent cx="914400" cy="914400"/>\n'
' <wp:wrapNone/>\n'
' <wp:docPr id="37" name="直接连接符 5"/>\n'
' <wp:cNvGraphicFramePr>\n'
' <a:graphicFrameLocks noChangeAspect="1"/>\n'
' </wp:cNvGraphicFramePr>\n'
' <a:graphic>\n'
' <a:graphicData uri="URI not set"/>\n'
' </a:graphic>\n'
'</wp:anchor>' % (nsdecls('wp', 'a', 'pic', 'r'), int(pos_x), int(pos_y))
)


# refer to docx.parts.story.BaseStoryPart.new_pic_inline
def new_pic_anchor(part, image_descriptor, width, height, pos_x, pos_y):
"""Return a newly-created `w:anchor` element.
The element contains the image specified by *image_descriptor* and is scaled
based on the values of *width* and *height*.
"""
rId, image = part.get_or_add_image(image_descriptor)
cx, cy = image.scaled_dimensions(width, height)
shape_id, filename = part.next_id, image.filename
return CT_Anchor.new_pic_anchor(shape_id, rId, filename, cx, cy, pos_x, pos_y)


# refer to docx.text.run.add_picture
def add_float_picture(p, image_path_or_stream, width=None, height=None, pos_x=0, pos_y=0):
"""Add float picture at fixed position `pos_x` and `pos_y` to the top-left point of page.
"""
run = p.add_run()
anchor = new_pic_anchor(run.part, image_path_or_stream, width, height, pos_x, pos_y)
run._r.add_drawing(anchor)


# refer to docx.oxml.__init__.py
register_element_cls('wp:anchor', CT_Anchor)

mssql(数据库操作)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import pymssql

def connectMSsql(params):
"""
连接数据库
"""
try:
if 'host' not in params or 'port' not in params or 'username' not in params or 'password' not in params or 'database_name' not in params:
raise Exception('缺少参数')
host = params.get('host')
port = params.get('port') # 端口这里默认是1433
try:
port = int(port)
except Exception:
return "端口信息错误"
username = params.get('username')
password = params.get('password')
database_name = params.get('database_name')
encoding = params.get('encoding')
conn = pymssql.connect(host=host,port=port, user=username, password=password, database=database_name,charset=encoding)
return conn
except pymssql._pymssql.OperationalError:
return "数据库连接超时"
except Exception as e:
raise e

def executeSQL(conn,sql):
"""
执行SQL语句,select ,update,insert
"""
try:
cursor = conn.cursor()
if sql.startswith("select") or sql.startswith("SELECT"):
cursor.execute(sql)
result = cursor.fetchall()
return result
elif sql.startswith("update") or sql.startswith("UPDATE"):
cursor.execute(sql)
conn.commit()
return {}
elif sql.startswith("insert") or sql.startswith("INSERT"):
cursor.execute(sql)
conn.commit()
return {}
else:
return {}
except Exception as e:
raise e

def closeConn(conn):
conn.close()

if __name__ == '__main__':
pass

main文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
from docx import Document
from docx.oxml.ns import qn
from docx.shared import Pt,Cm,RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH,WD_PARAGRAPH_ALIGNMENT
from docx.enum.table import WD_ALIGN_VERTICAL
from docx.enum.text import WD_UNDERLINE
from docx import shared
from docx.enum.section import WD_ORIENT
from docx.enum.text import WD_TAB_ALIGNMENT
from add_float_picture import add_float_picture
from extract_content_from_docx import extract_content_from_docx
from docx.enum.style import WD_STYLE_TYPE
from docx.oxml import OxmlElement
from docx.oxml import parse_xml
from docx.oxml.ns import nsdecls
import datetime
import os
from mssql import closeConn, connectMSsql, executeSQL
from loguru import logger
import re

# 创建日志对象
FileLog = 'log{}.log'.format(datetime.date.today())
logger.add(FileLog,rotation="500MB", encoding="utf-8", enqueue=True)

# 首页
def P0Content(num,date):
content1 = document.add_paragraph('\n\n\n\n\n\n\n\n\n\n\n')
content1Run = content1.add_run('公 司 经 营 情 况 周 报\n')
content1Run.font.name = '彩虹小标宋'
content1Run._element.rPr.rFonts.set(qn('w:eastAsia'),'彩虹小标宋')
content1Run.font.size = Pt(22)
content1Run.bold = True
content1.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER
add_float_picture(content1,'D:/workspace/weeklyNewspaper/img/P1_logo.png',height=shared.Cm(2.3),pos_x=shared.Cm(3.17),pos_y=shared.Cm(4.5))
add_float_picture(content1,'D:/workspace/weeklyNewspaper/img/P1_banner1.png',height=shared.Cm(6.11),pos_x=shared.Cm(-3.23),pos_y=shared.Cm(8))
numPara = document.add_paragraph('第{}期({})'.format(num,date))
numPara.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER

# 一级标题
def L1(content):
head1 = document.add_heading('', level = 1)
head1.add_run(content)
# 二级标题
def L2(content):
head2 =document.add_heading('',level = 2)
head2.add_run(content)
# 正文
def L0(content):
# 去除段落中单独的空白行
lines = content.split('\n')
new_lines = [line for line in lines if line.strip()!='']
# 添加段落
for i in new_lines:
par = document.add_paragraph(i)
par.paragraph_format.first_line_indent = Pt(28)

def set_table_boarder(table,**kwargs):
borders = OxmlElement('w:tblBorders')
for tag in ('bottom','top','left','right','insideV','insideH'):
edge_data = kwargs.get(tag)
if edge_data:
any_border = OxmlElement(f'w:{tag}')
for key in ['sz','val','color','space','shadow']:
if key in edge_data:
any_border.set(qn(f'w:{key}'),str(edge_data[key]))
borders.append(any_border)
table._tbl.tblPr.append(borders)


# 插入表格
def insertTable(row,col):
table = document.add_table(row,col)
set_table_boarder(table,
top = {'sz':4,'val':'single','color':'#000000'},
bottom = {'sz':4,'val':'single','color':'#000000'},
left = {'sz':4,'val':'single','color':'#000000'},
right = {'sz':4,'val':'single','color':'#000000'},
insideV = {'sz':4,'val':'single','color':'#000000'},
insideH = {'sz':4,'val':'single','color':'#000000'})
return table
# 页码规则
def AddFooterNumber(run):
fldChar1 = OxmlElement('w:fldChar') # creates a new element
fldChar1.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'Page'
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
t = OxmlElement('w:t')
t.text = "Seq"
fldChar2.append(t)
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
r_element = run._r
r_element.append(fldChar1)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
# 插入页码函数
def InsertPageNumber(footer):
# footer = Doc.sections[1].footer # 获取第一个节的页脚
footer.is_linked_to_previous = True #编号续前一节
paragraph = footer.paragraphs[0] # 获取页脚的第一个段落
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER#页脚居中对齐
run_footer=paragraph.add_run() # 添加页脚内容
AddFooterNumber(run_footer)
font = run_footer.font
font.name = 'Times New Roman'#新罗马字体
font.size = Pt(10) #10号字体
font.bold = True #加粗

# 设置首页页眉、页脚、页边距
def headerParagraphs0(hNum,hDate):
# 设置页边距
document.sections[0].top_margin = Cm(2.54)
document.sections[0].bottom_margin = Cm(2.54)
document.sections[0].left_margin = Cm(3.17)
document.sections[0].rigth_margin = Cm(3)
# 设置第一节页脚
run0 = document.sections[0].footer.paragraphs[0]
run0Style = run0.add_run('注:资产处置总部的周报相关内容由资产处置总部另行每周报送。')
run0Style.font.name = '彩虹粗仿宋'
run0Style._element.rPr.rFonts.set(qn('w:eastAsia'),'彩虹粗仿宋')
run0Style.font.size = Pt(10.5)

# 处理第二节页眉页脚
def headerParagraphs1(hNum,hDate):
document.add_section() # 添加第2节
# 设置页边距
document.sections[1].top_margin = Cm(2.54)
document.sections[1].bottom_margin = Cm(2.54)
document.sections[1].left_margin = Cm(3.17)
document.sections[1].right_margin = Cm(3)
# 设置第二节页眉
document.sections[1].header.is_linked_to_previous = False
run1 = document.sections[1].header.paragraphs[0]
run1Style = run1.add_run('公司经营情况周报[第{}期({})]'.format(hNum,hDate))
run1Style.font.name = '彩虹粗仿宋'
run1Style._element.rPr.rFonts.set(qn('w:eastAsia'),'彩虹粗仿宋')
run1Style.font.size = Pt(10.5)
# run1Style.add_picture(r'yemei_logo.png',height=shared.Cm(0.82) ,width=shared.Cm(12.19) ) # 页眉logo
add_float_picture(run1,'D:/workspace/weeklyNewspaper/img/yemei_logo.png',height=shared.Cm(0.8),pos_x=shared.Cm(15.19),pos_y=shared.Cm(0.95))
run1.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER
# 设置第二节页脚
document.sections[1].footer.is_linked_to_previous = False
# InsertPageNumber(document.sections[1].footer)

# 处理第三节页眉页脚
def headerParagraphs2(hNum,hDate):
document.add_section() # 添加第3节
document.sections[2].top_margin = Cm(3.17)
document.sections[2].bottom_margin = Cm(3)
document.sections[2].left_margin = Cm(2.54)
document.sections[2].right_margin = Cm(2.54)
# 设置第三节页眉
document.sections[2].header.is_linked_to_previous = False
run2 = document.sections[2].header.paragraphs[0]
run2Style = run2.add_run('\t公司经营情况周报[第{}期({})]\t'.format(hNum,hDate))
run2Style.font.name = '彩虹粗仿宋'
run2Style._element.rPr.rFonts.set(qn('w:eastAsia'),'彩虹粗仿宋')
run2Style.font.size = Pt(10.5)
# run2Style.add_picture(r'yemei_logo.png',height=shared.Cm(0.82) ) # 页眉logo
add_float_picture(run2,'D:/workspace/weeklyNewspaper/img/yemei_logo.png',height=shared.Cm(0.8),pos_x=shared.Cm(20),pos_y=shared.Cm(0.95))
run2.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER
# run2.runs[0].underline = WD_UNDERLINE.DOUBLE
# 设置第三节页脚
document.sections[2].footer.is_linked_to_previous = False
# InsertPageNumber(document.sections[2].footer)

# 设置页面方向为横向
def PaperOrientation(num):
section = document.sections[num-1]
new_width,new_height = section.page_height,section.page_width
section.orientation = WD_ORIENT.LANDSCAPE
section.page_width = new_width
section.page_height = new_height

# 设置表格首行重复显示
def Table_Line_Repeat(doc,firstcell):
# 通过遍历修改所有的表格
for table in doc.tables:
# 选择要首行重复的表格
if table.cell(0,0).text.strip() == firstcell:
# 找到该表格的第一行,将其属性设为可重复出现在每页顶端
row = table.rows[0]
tr = row._element
trPr = tr.get_or_add_trPr()
tblHeader = OxmlElement('w:tblHeader')
tblHeader.set(qn('w:val'), "true")
trPr.append(tblHeader)
return doc


if __name__ == '__main__':
logger.info('***开始处理word文档***')
# 连接数据库,获取部门周报信息、期号、周报时间
sqlConnectParams = {"database_name": "****", "username": "****", "password": "****", "host": "****", "port": "****", "encoding": "cp936"}
deptSqlQuery = "select (case when c.xsmc is null then d.departmentmark else c.xsmc end) ,e.gzsxmc,e.bzjzqk,e.xzgzap from formtable_main_246 a left join workflow_requestbase b on a.requestid=b.requestid left join uf_comweeklyorder c on c.bm=a.ngbm left join HrmDepartment d on d.id=a.ngbm left join formtable_main_246_dt1 e on e.mainid=a.id where b.currentnodetype=3 and c.bm is not null and b.createdate>=CONVERT ( CHAR ( 10 ), getdate()-5, 120 ) order by c.scsx,e.id" # 当前日期-5
numSqlQuery = "select top 1 num,replace(createdate,'-','') from uf_comweeklyset where isnull(flag,1)!=0 order by num "
conn = connectMSsql(sqlConnectParams)
logger.info('***已连接数据库***')
lista = executeSQL(conn,deptSqlQuery)
numdetail =executeSQL(conn,numSqlQuery)
logger.info('***已获取职能部门周报清单***')
#第几期
num = numdetail[0][0]
# 周报时间 格式yyyymmdd
date = numdetail[0][1]
logger.info('***已获取此次周报期号及生成日期***')
# 声明待处理文件夹
path = 'D:/aidownload/gongsizhoubao/{}'.format(datetime.datetime.today().date())
# 调用word模板
document = Document('D:/workspace/weeklyNewspaper/module.docx')
logger.info('***word模板调用完成***')
# 设置第一节(首页)页眉、页脚、页边距
headerParagraphs0(num,date) # 页眉页脚
#生成首页内容
P0Content(num,date)
logger.info('***第一节处理完成***')
# 第二节页眉页脚
headerParagraphs1(num,date)
# 第二节内容,遍历文件夹下的待合并word文档,递归提取段落和表格,并根据段落样式、表格内容生成最终的合并文档。
for i in os.listdir(path):
logger.info('***开始处理:{}***'.format(i))
document_content = extract_content_from_docx(path+'/'+i)
for category,level, data in document_content:
if category == 'paragraph':
if level == 'L0':
L0(data)
elif level == 'L1':
L1(data)
elif level == 'L2':
L2(data)
elif category == 'table':
# edit by chendi 20230721 去掉表格的粘贴,用占位符代替,之后通过vba插入表格。这样可确保表格的完整样式
L0('【插入表格】')
# table = insertTable(len(data),len(data[0]))
# # 20230710 根据提取的list,判断第一列是否有需要合并的单元格,若有,则合并,且将重复值的单元格赋为空
# for i in range(0,len(data)-1):
# for j in range(i+1,len(data)):
# if data[i][0] == data[j][0] :
# data[j][0] = ''
# table.cell(i,0).merge(table.cell(j,0))
# if data[i][1] == data[j][1] :
# data[j][1] = ''
# table.cell(i,1).merge(table.cell(j,1))

# for row in range(0,len(data)):
# for cell in range(0,len(data[row])):
# cellcontent = data[row][cell].replace(';\n','。\n') # 20230711 去掉单元格句尾分号
# cellcontent = cellcontent.replace(';\n','。\n') # 20230711 去掉单元格句尾分号
# run = table.cell(row,cell).paragraphs[0].add_run(cellcontent)
# run.font.size = Pt(12) # 字体大小为:小四
# # 首行居中
# if row == 0 :
# run.bold = True # 首行加粗
# table.cell(row,cell).paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
# else:
# table.cell(row,cell).paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
# table.cell(row,cell).vertical_alignment = WD_ALIGN_VERTICAL.CENTER
logger.info('***第二节处理完成***')
# 第三节页眉、页脚、页面方向
headerParagraphs2(num,date)
PaperOrientation(3)
# 第三节内容
L1('职能部门')
table = insertTable(len(lista)+1,4)
# 处理查询出的list信息,去除<br>,添加表头
listb = [['填报部门','工作事项名称','本周进展情况','下周工作安排']]
for i in lista:
listc = []
for j in i :
b = j.replace('<br>','\n')
listc.append(b)
listb.append(listc)
# 根据提取的list,判断第一列是否有需要合并的单元格,若有,则合并,且将重复值的单元格赋为空
for i in range(0,len(listb)-1):
for j in range(i+1,len(listb)):
if listb[i][0] == listb[j][0]:
listb[j][0] = ''
table.cell(i,0).merge(table.cell(j,0))
# 列宽设置
for rowa in table.rows:
rowa.cells[0].width = Cm(2.55)
rowa.cells[1].width = Cm(3)
rowa.cells[2].width = Cm(9.94)
rowa.cells[3].width = Cm(9.71)
# list内容填充及样式调整
for i,row in enumerate(table.rows):
for j,cell in enumerate(row.cells):
if listb[i][j] != '':
# 句尾不涉及分号、句号、冒号,增加句号
if i!=0 and j>1 and listb[i][j][-1] not in [';','。',':']:
listb[i][j] = listb[i][j]+'。'
# \n前不涉及分号、句号、冒号,增加句号
listb[i][j] = re.sub(r'(?<![;。:])\n','。\n',listb[i][j])
# 句首首字母是数字,且第二个字符不是数字时,数字后增加符号:.
listb[i][j] = re.sub(r'\n(\d)(?!\d)',r'\n\1.',listb[i][j])
# 句首首字母是数字,增加符号“.”之后删除掉后面的空格、句号、冒号和点
listb[i][j] = re.sub(r'(?<=\n(\d\.))[ ,。、.]','',listb[i][j])
table.cell(i,j).text = listb[i][j] # 周报内容填充
try :
cell.paragraphs[0].runs[0].font.size = Pt(11)
except IndexError as e:
pass
if i == 0:
cell.paragraphs[0].runs[0].bold = True
cell.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # 水平居中
else :
cell.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.LEFT #水平靠左
cell.vertical_alignment = WD_ALIGN_VERTICAL.CENTER

InsertPageNumber(document.sections[2].footer)
# 第三节表格首行重复显示
Table_Line_Repeat(document,"填报部门")
document.save('{}/公司周报{}.docx'.format(path,datetime.datetime.today().date()))
logger.info('***第三节处理完成***')
# 处理结束,更新生成状态,关闭数据库连接
numSqlUpdate = "update uf_comweeklyset set flag=0,ts={} where num={}".format(date,num)
executeSQL(conn,numSqlUpdate)
closeConn(conn)
logger.info('***处理结束,更新处理状态***')

商业转载请联系作者获得授权,非商业转载请注明出处。

支付宝打赏 微信打赏

如果文章对你有帮助,欢迎点击上方按钮打赏作者

Python|实现公司周报的自动生成
http://hncd1024.github.io/2023/06/25/Python_docx/
作者
CHEN DI
发布于
2023-06-25
许可协议