ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Feature Request]: Could checkpoint saving functionality be implemented in GraphRAG parsing to allow resuming from the last saved state after failures?

Open Werewolf-Wu opened this issue 1 day ago • 6 comments

Is there an existing issue for the same feature request?

  • [x] I have checked the existing issues.

Is your feature request related to a problem?

  1. Graphrag解析是十分漫长的过程,其中出现各种因素均可导致失败(如多次timeout,短期达到token limit被限制并发等)

  2. 解析失败后再次解析则必须完全推倒重来,再次从提取实体开始,消耗极大量tokens

  3. Trigger summary似乎不是并行的,消耗大量时间,以及大大增加网络连接导致timeout(百炼日常)出现的错误率,一旦发生还是要全盘推倒重来

    1. Graphrag parsing is an extremely lengthy process, and various factors during this process can lead to failures (such as multiple timeouts, temporary restrictions on concurrent requests due to reaching token limits, etc.).
    2. When parsing fails, restarting the process requires completely starting from scratch - beginning again with entity extraction, which consumes an enormous amount of tokens.
    3. The trigger summary mechanism appears to lack parallel processing capability, consuming substantial time and significantly increasing network-related error rates (common in Bailian systems) that often lead to timeouts. Any occurrence of these issues necessitates restarting the entire process from the beginning.

Describe the feature you'd like

  1. 增加一个限制并发的选项,防止在提取实体阶段产生大量并发导致错误率飙升

  2. 如#4983 提及,换用chat_streamly处理部分内容减少Timeout现象

  3. 如可能,烦请考虑GraphRAG的解析进度检查点保存功能

  4. trigger summary是否可考虑并行(以及并发数控制!),或增加导出为批量推理的功能?

    1. Add a concurrency-limiting option to prevent a surge in error rates caused by excessive concurrent requests during the entity extraction phase.
    2. As mentioned in #4983, switch to using chat_streamly to handle certain components, thereby reducing timeout occurrences.
    3. If feasible, please consider implementing a parse progress checkpoint saving feature for GraphRAG.
    4. Could the trigger summary mechanism be optimized with parallel processing (and concurrency control), or could a batch inference export function be added?

Describe implementation you've considered

No response

Documentation, adoption, use case


Additional information

No response

Werewolf-Wu avatar Feb 24 '25 13:02 Werewolf-Wu