dify对接textin实现图片/pdf/excl的提取内容并转换为markwodn/excl输出

将http节点返回的内容如

{
  "status_code": 200,
  "body": "{\"x_request_id\":\"5d54bffb67988fff2f59e055a3b54ea1\",\"duration\":137,\"message\":\"Success\",\"result\":{\"markdown\":\"[\\/www\\/wwwroot\\/tz.ncncy.com】\\n\\n\",\"success_count\":1,\"pages\":[{\"angle\":0,\"page_id\":1,\"content\":[{\"pos\":[3,8,223,8,223,29,3,29],\"id\":0,\"score\":0.98100000619888,\"type\":\"line\",\"text\":\"[\\/www\\/wwwroot\\/tz.ncncy.com】\"}],\"status\":\"Success\",\"height\":35,\"structured\":[{\"pos\":[3,11,223,11,223,25,3,25],\"type\":\"textblock\",\"id\":0,\"content\":[0],\"text\":\"[\\/www\\/wwwroot\\/tz.ncncy.com】\",\"outline_level\":-1,\"sub_type\":\"text\"}],\"durations\":117.35350799561,\"image_id\":\"\",\"width\":225}],\"valid_page_number\":1,\"total_page_number\":1,\"total_count\":1,\"detail\":[{\"paragraph_id\":0,\"page_id\":1,\"tags\":[],\"outline_level\":-1,\"text\":\"[\\/www\\/wwwroot\\/tz.ncncy.com】\",\"type\":\"paragraph\",\"position\":[3,11,223,11,223,25,3,25],\"content\":0,\"sub_type\":\"text\"}]},\"metrics\":[{\"angle\":0,\"page_id\":1,\"status\":\"Success\",\"duration\":135.72903442383,\"page_image_width\":225,\"page_image_height\":35}],\"code\":200,\"version\":\"3.15.13\"}",
  "headers": {
    "date": "Mon, 28 Apr 2025 03:11:58 GMT",
    "content-type": "application/json;charset=utf-8",
    "content-length": "1002",
    "connection": "keep-alive",
    "access-control-max-age": "86400",
    "access-control-allow-origin": "*",
    "access-control-allow-headers": "Content-Type,token,No-Cache,Pragma,Cache-Control,X-Requested-With,x-ti-app-id,x-ti-secret-code",
    "access-control-expose-headers": "X-Request-Id",
    "server": "Intsig Web Server",
    "strict-transport-security": "max-age=3600; includeSubDomains; preload",
    "x-request-id": "5d54bffb67988fff2f59e055a3b54ea1"
  },
  "files": []
}

得到的是一个”body”中含\”markdown\”的格式，我们使用代码节点提取markdown内容

import json

def main(arg1: str) -> dict:
    # 将arg1（JSON字符串）解析为字典
    data = json.loads(arg1)
    
    # 提取result部分的markdown字段
    markdown_content = data.get("result", {}).get("markdown", "")
    
    # 返回提取的markdown内容
    return {
        "result": markdown_content,
    }

再使用markdown转文件节点得到你需要的内容格式

发送评论 编辑评论

发送评论编辑评论