Python|实现公文系统批量收藏

我们公司公文系统开发过一个【收藏】的功能,当点击【收藏】按钮的时候,系统会将当前收文/发文/签报里的正文、附件、审批表单打包成一个.zip文件下载到本地。
这几天在做档案系统的迁移工作,需要把公文系统一年的内容批量下载下来。时间比较着急,也来不急找供应商再单独做个功能了。这次我的处理方式如下:

  • 在浏览器中随便打开一个签报,点击【收藏】,F12跟一下请求,发现会通过POST请求调用 /api/onekeysave/save 接口。请求体主要是当前签报的字号、标题、id。
  • 在上一步的POST请求中,返回的响应体中会有个 loadlink 字段,该字段就是下载链接。点击下载就能把.zip文件下载到本地。
  • 找到了上面的接口,我只需要从数据库中查询出来要下载的文件清单,从浏览器中复制下cookies,然后遍历调用接口下载文件即可。

最终七百多份文件(大小约1.5G)仅十几分钟就完成了下载。

源码

待下载清单

1
2
3
4
5
select  '签报' as 类别,a.fwzh as fwzh,a.fwtm as fwtm,a.requestid as requestid ,a.ngrq as ngrq from formtable_main_5 a left join workflow_requestbase b on a.requestid=b.requestid where a.ngrq<='2022-12-31' and b.status in ('部门综合','结束') 
union all
select '发文'as 类别,c.fwzh as fwzh,c.fwtm as fwtm,c.requestid as requestid,c.ngrq as ngrq from formtable_main_4 c left join workflow_requestbase d on c.requestid=d.requestid where c.ngrq<='2022-12-31' and d.status in ('部门综合','结束','强制归档')
union all
select '收文' as 类别,e.fwzh as fwzh,e.fwtm as fwtm,e.requestid as requestid ,e.cwrq as ngrq from formtable_main_3 e where e.cwrq<='2022-12-31' and djr=102

python代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import requests
import json
import pandas as pd

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0',
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Host': 'portal.jictrust.cn',
'Pragma': 'no-cache',
}

# 将cookie字符串转为字典
def getCookies(cookie_str):
cookie_items = cookie_str.split(';')
cookie_dict = {}
for item in cookie_items:
key, value = item.strip().split('=', 1)
cookie_dict[key] = value
return cookie_dict

# 获取文件下载链接
def getDownloadPath(cookies,params):
url = r'http://xxx/api/onekeysave/save'
result = requests.post(url=url,headers=headers,cookies=cookies,json=params)
return result.text

# 获取文件内容
def getFiles(cookies,downloadPath) :
response = requests.get(downloadPath, headers=headers, cookies=cookies)
return(response)

# 下载到本地
def downloadFiles(response,filetype,filename):
path = r'D:\workspace\新公文系统\2022年\{}\{}'.format(filetype,filename)
with open(path ,'wb') as f:
f.write(response.content)
print('文件下载完成')

if __name__ == '__main__':
df = pd.read_excel(r'C:\新公文系统.xlsx',sheet_name='Sheet1',engine='openpyxl')
cookie_str = r'ecology_JSessionid=aaaxxx8Ok_L7Vy; JSESSIONID=aaaLxxxOk_L7Vy; loginxxxaver=xxx; languxxxaver=xx; __randcode__=c9165f7d-9xxxxxxdf62bb4b'
cookie_dict = getCookies(cookie_str)
for index,row in df.iterrows():
print(index,'_',row[1],'_下载开始:')
fwzh = ''
if pd.isnull(row[1]):
fwzh = ""
else:
fwzh = row[1]
params = {"fwzh":fwzh,"fwtm":row[2],"requestid":str(row[3]),"maintable":"formtable_main_233","zw":"zw","fj":"fj","path":"D:/weaver/ecology/page/resource/userfile/other/","configip_target":"80","OA_APPID":"b81a21daxxxf2d5bd","category":"1"}
# 获取文件压缩信息
download_path = getDownloadPath(cookie_dict,params)
download_path = json.loads(download_path)
downloadUrl = 'http://xxx' + download_path['data']['loadlink']
response = getFiles(cookie_dict,downloadUrl)
downloadFiles(response,row[0],str(row[3])+'_'+download_path['data']['filename'])
print('--------------------')

商业转载请联系作者获得授权,非商业转载请注明出处。

支付宝打赏 微信打赏

如果文章对你有帮助,欢迎点击上方按钮打赏作者

Python|实现公文系统批量收藏
http://hncd1024.github.io/2023/12/07/Python_ecologyFileDownload2/
作者
CHEN DI
发布于
2023-12-07
许可协议