ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: The community formation step gets into an infinite loop after a failure message

Open LiorZ opened this issue 11 months ago • 3 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

RAGFlow workspace code commit ID

unknown

RAGFlow image version

g7b5d8312

Other environment information


Actual behavior

To test the community building step, I ran the knowledge graph construction on one document out of ~16,000. It looks like it keeps trying to build a community, fails and then tries again, ad-inifitum.

(see attached log for the repeated steps: log_short.txt)

Here is the failure line \\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", ",", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", ",", "\\\\"", ",", " ", "\\\\"", " ", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", " ", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", ",", "\\\\"", ",", " ", "\\\\"", " ", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\\\\\", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", ",", "\\\\"", ",", " ", "\\\\"", " ", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "]", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", ",", "\\\\"", ",", " ", "\\\\"", " ", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\\\\\", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", ",", "\\\\"", ",", " ", "\\\\"", " ", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "]", "\\\\"", ",", " ", "\\\\"", "\\\\\\\\", "\\\\"", "\\\\"", ",", " ", "\\\\"", "]", "\\\\"", "]"]\\\'\', \'caused_by\': {\'type\': \'illegal_argument_exception\', \'reason\': \'Document contains at least one immense term in field="source_id" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: \\\'[91, 34, 91, 34, 44, 32, 34, 92, 34, 34, 44, 32, 34, 91, 34, 44, 32, 34, 92, 34, 34, 44, 32, 34, 44, 34, 44, 32, 34, 32]...\\\'\'}}, \'status\': 400}]}')ctx._source.content_with_weight=params.pp_content_with_weight; ctx._source.knowledge_graph_kwd=params.pp_knowledge_graph_kwd; ctx._source.kb_id=params.pp_kb_id; ctx._source.source_id=params.pp_source_id; ctx._source.available_int=0; ctx._source.removed_kwd=params.pp_removed_kwd;

Expected behavior

The community building steps should be finished in a reasonable time and not run forever

Steps to reproduce

Enable GraphRAG with community and entities recognition

Additional information

No response

LiorZ avatar Feb 20 '25 14:02 LiorZ

What about changing to another LLM?

KevinHuSh avatar Feb 21 '25 05:02 KevinHuSh

11:09:06 Page(217~229): [ERROR]Insert chunk error: ['853acf6486a4e08f:{'type': 'document_parsing_exception', 'reason': "[1:1768] failed to parse field [important_kwd] of type [keyword] in document with id '853acf6486a4e08f'. Preview of field's value: 'Zabbix?? MySQL???? ?? Zabbixagent?/data/sh MySQL???? Zabbix?? MySQL??????? MySQL???? Zabbix?? MySQL??????? ....... \n???????????????????????????'", 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Document contains at least one immense term in field="important_kwd" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: \'[90, 97, 98, 98, 105, 120, -25, -101, -111, -26, -114, -89, 32, 77, 121, 83, 81, 76, -28, -72, -69, -28, -69, -114, -27, -92, -115, -27, -120, -74]...\''}}'], please check log file and Elasticsearch/Infinity status!

tianlala123 avatar May 22 '25 06:05 tianlala123

I was unable to find any existing issues or discussions in the repository that specifically address the infinite loop bug in the community formation step.

From the error message you provided, the root cause appears to be an illegal_argument_exception due to a term in the source_id field being too long (exceeding the UTF8 max length of 32,766). This failure seems to cause the community formation logic to retry indefinitely, resulting in an infinite loop instead of handling the error gracefully.

I was not able to locate the exact part of the codebase where the community formation step is implemented or any recent changes directly related to this logic. However, there is a file (rag/utils/infinity_conn.py) that handles conditions and connections, which could be related to the logic leading to the infinite loop, especially if it is involved in database operations during community formation. The function equivalent_condition_to_str in this file processes conditions and may play a role in this process, but there is no direct evidence linking it to the infinite loop from the information available so far cite.

Based on the error, a likely workaround would be to ensure that the source_id field does not exceed the allowed length before it is processed or indexed. You may also want to check if there is a retry or error-handling mechanism in the community formation logic that could be improved to avoid infinite retries on persistent errors.

If you can provide more details about the relevant code or configuration, I can help investigate further.

To continue talking to Dosu, mention @dosu-bot.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 22 '25 06:05 dosubot[bot]