youtu-graphrag icon indicating copy to clipboard operation
youtu-graphrag copied to clipboard

支持更多文档格式

Open HZJprince opened this issue 3 months ago • 6 comments

RT,现在只支持上传txt,json,md,连基本的word和pdf都不支持?

HZJprince avatar Sep 18 '25 01:09 HZJprince

word和pdf很难通过直接解析文件获取内容

Chen1303005809 avatar Sep 18 '25 03:09 Chen1303005809

这世上的文献难道都是txt?如果你腾讯开源一个项目出来,连世上最基础的word和pdf文档都支持不了,还让人怎么用?

HZJprince avatar Sep 18 '25 03:09 HZJprince

  • It should be noticed this open-source project primarily focuses on algorithms related to knowledge graph construction and retrieval. Accurately adapting and parsing various document formats is inherently a complex task, and this is not the main focus of the project.

  • The primary purpose of this open-source project is to attract peers to collaborate and co-build, continuously improve the project, and contribute to the technological advancement in this field. With this in mind, this project is certainly not perfect. Therefore, we will carefully listen to feedback from the community and make adjustments accordingly. We also warmly welcome everyone to join us in building and improving it.

  • We are already in the process of preparing a document parsing project, which is expected to be open-sourced in the second half of this year. Before that, if you wish to use document parsing capabilities, we recommend using the tool provided by Tencent Cloud. (https://cloud.tencent.com/document/product/1772/115340)

siyuan-youtu avatar Sep 18 '25 06:09 siyuan-youtu

这世上的文献难道都是txt?如果你腾讯开源一个项目出来,连世上最基础的word和pdf文档都支持不了,还让人怎么用?

不急不急,感觉这个还是论文级的项目,很多都不成熟,还得靠广大同志

YaomaWu avatar Sep 18 '25 14:09 YaomaWu

这世上的文献难道都是txt?如果你腾讯开源一个项目出来,连世上最基础的word和pdf文档都支持不了,还让人怎么用?

easy way, use minerU2 api to transform

samqin123 avatar Sep 19 '25 03:09 samqin123

好像有个开源项目,直接是利用图像解析为知识库,不是ocr方式,不知道能否结合考虑

samqin123 avatar Sep 19 '25 03:09 samqin123