我们公司公文系统开发过一个【收藏】的功能，当点击【收藏】按钮的时候，系统会将当前收文/发文/签报里的正文、附件、审批表单打包成一个.zip文件下载到本地。
这几天在做档案系统的迁移工作，需要把公文系统一年的内容批量下载下来。时间比较着急，也来不急找供应商再单独做个功能了。这次我的处理方式如下：

在浏览器中随便打开一个签报，点击【收藏】，F12跟一下请求，发现会通过POST请求调用 /api/onekeysave/save 接口。请求体主要是当前签报的字号、标题、id。
在上一步的POST请求中，返回的响应体中会有个 loadlink 字段，该字段就是下载链接。点击下载就能把.zip文件下载到本地。
找到了上面的接口，我只需要从数据库中查询出来要下载的文件清单，从浏览器中复制下cookies，然后遍历调用接口下载文件即可。

最终七百多份文件（大小约1.5G）仅十几分钟就完成了下载。

源码

待下载清单

select  '签报' as 类别,a.fwzh as fwzh,a.fwtm as fwtm,a.requestid as requestid ,a.ngrq as ngrq from formtable_main_5 a left join workflow_requestbase b on a.requestid=b.requestid where a.ngrq<='2022-12-31' and b.status in ('部门综合','结束') 
union all 
select  '发文'as 类别,c.fwzh as fwzh,c.fwtm as fwtm,c.requestid as requestid,c.ngrq as ngrq from formtable_main_4 c left join workflow_requestbase d on c.requestid=d.requestid  where c.ngrq<='2022-12-31' and d.status in ('部门综合','结束','强制归档')  
union all 
select   '收文' as 类别,e.fwzh as fwzh,e.fwtm as fwtm,e.requestid as requestid ,e.cwrq as ngrq  from formtable_main_3 e  where e.cwrq<='2022-12-31' and djr=102

python代码

import requests
import json
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0',
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'Host': 'portal.jictrust.cn',
    'Pragma': 'no-cache',
}

# 将cookie字符串转为字典
def getCookies(cookie_str):
    cookie_items = cookie_str.split(';')   
    cookie_dict = {}  
    for item in cookie_items:  
        key, value = item.strip().split('=', 1)  
        cookie_dict[key] = value  
    return cookie_dict

# 获取文件下载链接
def getDownloadPath(cookies,params):
    url = r'http://xxx/api/onekeysave/save'
    result = requests.post(url=url,headers=headers,cookies=cookies,json=params)
    return result.text

# 获取文件内容
def getFiles(cookies,downloadPath) :
    response = requests.get(downloadPath, headers=headers, cookies=cookies)
    return(response)

# 下载到本地
def downloadFiles(response,filetype,filename):
    path = r'D:\workspace\新公文系统\2022年\{}\{}'.format(filetype,filename)
    with open(path ,'wb') as f:
        f.write(response.content)
    print('文件下载完成')

if __name__ == '__main__':
    df = pd.read_excel(r'C:\新公文系统.xlsx',sheet_name='Sheet1',engine='openpyxl')
    cookie_str = r'ecology_JSessionid=aaaxxx8Ok_L7Vy; JSESSIONID=aaaLxxxOk_L7Vy; loginxxxaver=xxx; languxxxaver=xx; __randcode__=c9165f7d-9xxxxxxdf62bb4b'
    cookie_dict = getCookies(cookie_str)
    for index,row in df.iterrows():
        print(index,'_',row[1],'_下载开始:')
        fwzh = ''
        if pd.isnull(row[1]):
            fwzh = ""
        else:
            fwzh = row[1]
        params = {"fwzh":fwzh,"fwtm":row[2],"requestid":str(row[3]),"maintable":"formtable_main_233","zw":"zw","fj":"fj","path":"D:/weaver/ecology/page/resource/userfile/other/","configip_target":"80","OA_APPID":"b81a21daxxxf2d5bd","category":"1"}
        # 获取文件压缩信息
        download_path = getDownloadPath(cookie_dict,params)
        download_path = json.loads(download_path)
        downloadUrl = 'http://xxx' + download_path['data']['loadlink']
        response = getFiles(cookie_dict,downloadUrl)
        downloadFiles(response,row[0],str(row[3])+'_'+download_path['data']['filename'])
        print('--------------------')