支持更多文档格式
RT,现在只支持上传txt,json,md,连基本的word和pdf都不支持?
word和pdf很难通过直接解析文件获取内容
这世上的文献难道都是txt?如果你腾讯开源一个项目出来,连世上最基础的word和pdf文档都支持不了,还让人怎么用?
-
It should be noticed this open-source project primarily focuses on algorithms related to knowledge graph construction and retrieval. Accurately adapting and parsing various document formats is inherently a complex task, and this is not the main focus of the project.
-
The primary purpose of this open-source project is to attract peers to collaborate and co-build, continuously improve the project, and contribute to the technological advancement in this field. With this in mind, this project is certainly not perfect. Therefore, we will carefully listen to feedback from the community and make adjustments accordingly. We also warmly welcome everyone to join us in building and improving it.
-
We are already in the process of preparing a document parsing project, which is expected to be open-sourced in the second half of this year. Before that, if you wish to use document parsing capabilities, we recommend using the tool provided by Tencent Cloud. (https://cloud.tencent.com/document/product/1772/115340)
这世上的文献难道都是txt?如果你腾讯开源一个项目出来,连世上最基础的word和pdf文档都支持不了,还让人怎么用?
不急不急,感觉这个还是论文级的项目,很多都不成熟,还得靠广大同志
这世上的文献难道都是txt?如果你腾讯开源一个项目出来,连世上最基础的word和pdf文档都支持不了,还让人怎么用?
easy way, use minerU2 api to transform
好像有个开源项目,直接是利用图像解析为知识库,不是ocr方式,不知道能否结合考虑