ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Community reports generation fails 0.17.2

Open vanalexgr opened this issue 9 months ago • 2 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

Parsing fails when selecting Community reports generation

I have tried both with light and general knowledge graph options several times.

I am using 4o-mini from Azure with 900k tokens limit.

OCR has been done both with Opendoc and mistral-ocr-latest

Begin at: Sun, 16 Mar 2025 02:07:44 GMT Duration: 3028.72 s Progress: 02:07:44 Task has been received. 02:07:45 Page(1~65): OCR started 02:20:56 Page(1~65): OCR finished (790.46s) 02:23:02 Page(1~65): Layout analysis (126.11s) 02:23:07 Page(1~65): Table analysis (5.29s) 02:23:14 Page(1~65): Text merged (6.33s) 02:23:14 Page(1~65): Page 0~64: Text merging finished 02:24:14 Page(1~65): Start to generate keywords for every chunk ... 02:24:15 Page(1~65): Keywords generation 150 chunks completed in 0.47s 02:24:15 Page(1~65): Start to generate questions for every chunk ... 02:24:16 Page(1~65): Question generation 150 chunks completed in 1.43s 02:24:16 Page(1~65): Generate 150 chunks 02:24:26 Page(1~65): Embedding chunks (9.51s) 02:24:39 Page(1~65): Indexing done (13.20s). Task done (1015.10s) 02:24:42 created task raptor 02:24:42 Task has been received. 02:24:58 Cluster one layer: 150 -> 3 02:25:08 Cluster one layer: 3 -> 2 02:25:16 Cluster one layer: 2 -> 1 02:25:17 Indexing done (0.64s). Task done (35.08s) 02:30:17 Entities extraction of chunk 26 132/156 done, 11 nodes, 11 edges, 7076 tokens. 02:30:19 Entities extraction of chunk 32 133/156 done, 17 nodes, 14 edges, 8944 tokens. 02:30:20 Entities extraction of chunk 30 134/156 done, 15 nodes, 15 edges, 8933 tokens. 02:30:21 Entities extraction of chunk 7 135/156 done, 7 nodes, 9 edges, 6755 tokens. 02:30:22 Entities extraction of chunk 11 136/156 done, 11 nodes, 6 edges, 7197 tokens. 02:30:22 Entities extraction of chunk 14 137/156 done, 10 nodes, 8 edges, 7137 tokens. 02:30:23 Entities extraction of chunk 10 138/156 done, 10 nodes, 9 edges, 8109 tokens. 02:30:24 Entities extraction of chunk 17 139/156 done, 17 nodes, 15 edges, 9001 tokens. 02:30:24 Entities extraction of chunk 16 140/156 done, 11 nodes, 10 edges, 7628 tokens. 02:30:24 Entities extraction of chunk 22 141/156 done, 31 nodes, 6 edges, 11385 tokens. 02:30:26 Entities extraction of chunk 28 142/156 done, 19 nodes, 18 edges, 10771 tokens. 02:30:27 Entities extraction of chunk 0 143/156 done, 9 nodes, 4 edges, 5549 tokens. 02:30:27 Entities extraction of chunk 9 144/156 done, 12 nodes, 10 edges, 8701 tokens. 02:30:27 Entities extraction of chunk 4 145/156 done, 9 nodes, 2 edges, 5618 tokens. 02:30:28 Entities extraction of chunk 12 146/156 done, 18 nodes, 15 edges, 9790 tokens. 02:30:29 Entities extraction of chunk 25 147/156 done, 20 nodes, 18 edges, 10715 tokens. 02:30:30 Entities extraction of chunk 1 148/156 done, 11 nodes, 11 edges, 7875 tokens. 02:30:30 Entities extraction of chunk 21 149/156 done, 36 nodes, 13 edges, 19633 tokens. 02:30:31 Entities extraction of chunk 8 150/156 done, 18 nodes, 15 edges, 9703 tokens. 02:30:31 Entities extraction of chunk 6 151/156 done, 15 nodes, 15 edges, 9202 tokens. 02:30:31 Entities extraction of chunk 2 152/156 done, 23 nodes, 9 edges, 9031 tokens. 02:30:32 Entities extraction of chunk 13 153/156 done, 23 nodes, 22 edges, 10244 tokens. 02:30:33 Entities extraction of chunk 5 154/156 done, 25 nodes, 17 edges, 12178 tokens. 02:30:34 Entities extraction of chunk 15 155/156 done, 18 nodes, 13 edges, 9851 tokens. 02:30:37 Entities extraction of chunk 3 156/156 done, 19 nodes, 17 edges, 10021 tokens. 02:30:37 Entities and relationships extraction done, 1528 nodes, 1596 edges, 1426392 tokens, 317.11s. 02:36:34 Entities merging done, 357.48s. 02:45:09 Relationships merging done, 514.56s. 02:45:09 generated subgraph for doc 57dfb8ae01c811f0817f0242ac120006 in 1189.56 seconds. 02:45:23 merging subgraph for doc 57dfb8ae01c811f0817f0242ac120006 into the global graph done in 13.72 seconds. 02:45:29 Identified 192483 candidate pairs 02:58:08 [ERROR][Exception]: Exceptions from Trio nursery (6 sub-exceptions) -- ERROR: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 290246 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

vanalexgr avatar Mar 15 '25 19:03 vanalexgr

Set the max token to 128000 while adding the models. Image

KevinHuSh avatar Mar 17 '25 03:03 KevinHuSh

Thank you. I have removed the model and added as instructed with 128000 Max tokens

I now get this error:

Begin at: Mon, 17 Mar 2025 17:34:54 GMT Duration: 6028.60 s Progress: 17:34:54 Task has been received. 17:57:16 Page(1~23): OCR started 18:01:44 Page(1~23): OCR finished (268.19s) 18:02:27 Page(1~23): Layout analysis (42.26s) 18:02:28 Page(1~23): Table analysis (1.56s) 18:02:31 Page(1~23): Text merged (2.66s) 18:02:31 Page(1~23): Page 0~22: Text merging finished 18:03:07 Page(1~23): Start to generate keywords for every chunk ... 18:03:10 Page(1~23): Keywords generation 60 chunks completed in 2.22s 18:03:10 Page(1~23): Start to generate questions for every chunk ... 18:03:16 Page(1~23): Question generation 60 chunks completed in 6.06s 18:03:16 Page(1~23): Generate 60 chunks 18:03:20 Page(1~23): Embedding chunks (4.07s) 18:03:25 Page(1~23): Indexing done (5.44s). Task done (1711.07s) 17:34:54 Task has been received. 18:02:34 Page(23~45): OCR started 18:06:49 Page(23~45): OCR finished (255.26s) 18:07:30 Page(23~45): Layout analysis (41.22s) 18:07:32 Page(23~45): Table analysis (2.03s) 18:07:34 Page(23~45): Text merged (2.17s) 18:08:30 Page(23~45): Start to generate keywords for every chunk ... 18:08:32 Page(23~45): Keywords generation 76 chunks completed in 1.82s 18:08:32 Page(23~45): Start to generate questions for every chunk ... 18:08:37 Page(23~45): Question generation 76 chunks completed in 4.22s 18:08:37 Page(23~45): Generate 76 chunks 18:08:42 Page(23~45): Embedding chunks (5.01s) 18:08:49 Page(23~45): Indexing done (7.08s). Task done (2034.21s) 17:34:55 Task has been received. 18:07:37 Page(45~65): OCR started 18:12:14 Page(45~65): OCR finished (276.62s) 18:12:52 Page(45~65): Layout analysis (37.94s) 18:12:53 Page(45~65): Table analysis (1.06s) 18:12:53 Page(45~65): Text merged (0.75s) 18:13:03 Page(45~65): Start to generate keywords for every chunk ... 18:13:07 Page(45~65): Keywords generation 18 chunks completed in 4.54s 18:13:07 Page(45~65): Start to generate questions for every chunk ... 18:13:11 Page(45~65): Question generation 18 chunks completed in 4.02s 18:13:11 Page(45~65): Generate 18 chunks 18:13:13 Page(45~65): Embedding chunks (1.99s) 18:13:15 Page(45~65): Indexing done (1.71s). Task done (2299.79s) 18:13:17 created task raptor 18:13:17 Task has been received. 18:13:34 Cluster one layer: 154 -> 4 18:13:46 Cluster one layer: 4 -> 3 18:14:06 Cluster one layer: 3 -> 2 18:14:15 Cluster one layer: 2 -> 1 18:14:16 Indexing done (1.08s). Task done (59.48s) 18:25:31 Entities extraction of chunk 31 137/164 done, 18 nodes, 15 edges, 9328 tokens. 18:25:32 Entities extraction of chunk 15 138/164 done, 19 nodes, 7 edges, 8034 tokens. 18:25:32 Entities extraction of chunk 24 139/164 done, 14 nodes, 11 edges, 9610 tokens. 18:25:33 Entities extraction of chunk 26 140/164 done, 19 nodes, 11 edges, 11298 tokens. 18:25:34 Entities extraction of chunk 30 141/164 done, 22 nodes, 6 edges, 8673 tokens. 18:25:35 Entities extraction of chunk 22 142/164 done, 22 nodes, 15 edges, 10118 tokens. 18:25:38 Entities extraction of chunk 0 143/164 done, 8 nodes, 0 edges, 4733 tokens. 18:25:39 Entities extraction of chunk 29 144/164 done, 15 nodes, 14 edges, 9415 tokens. 18:25:39 Entities extraction of chunk 36 145/164 done, 28 nodes, 10 edges, 12125 tokens. 18:25:41 Entities extraction of chunk 14 146/164 done, 16 nodes, 14 edges, 9078 tokens. 18:25:41 Entities extraction of chunk 8 147/164 done, 11 nodes, 10 edges, 7550 tokens. 18:25:42 Entities extraction of chunk 23 148/164 done, 30 nodes, 12 edges, 12103 tokens. 18:25:46 Entities extraction of chunk 18 149/164 done, 21 nodes, 16 edges, 10444 tokens. 18:25:47 Entities extraction of chunk 19 150/164 done, 26 nodes, 14 edges, 11622 tokens. 18:25:47 Entities extraction of chunk 12 151/164 done, 17 nodes, 7 edges, 8960 tokens. 18:25:48 Entities extraction of chunk 5 152/164 done, 12 nodes, 10 edges, 7882 tokens. 18:25:52 Entities extraction of chunk 11 153/164 done, 20 nodes, 18 edges, 10590 tokens. 18:25:53 Entities extraction of chunk 13 154/164 done, 16 nodes, 12 edges, 9306 tokens. 18:25:53 Entities extraction of chunk 1 155/164 done, 9 nodes, 5 edges, 6071 tokens. 18:25:54 Entities extraction of chunk 7 156/164 done, 24 nodes, 7 edges, 10287 tokens. 18:25:55 Entities extraction of chunk 16 157/164 done, 20 nodes, 20 edges, 11135 tokens. 18:26:00 Entities extraction of chunk 3 158/164 done, 16 nodes, 14 edges, 10246 tokens. 18:26:02 Entities extraction of chunk 17 159/164 done, 24 nodes, 20 edges, 12207 tokens. 18:26:02 Entities extraction of chunk 9 160/164 done, 17 nodes, 15 edges, 9633 tokens. 18:26:06 Entities extraction of chunk 2 161/164 done, 15 nodes, 12 edges, 9151 tokens. 18:26:07 Entities extraction of chunk 20 162/164 done, 21 nodes, 20 edges, 10484 tokens. 18:26:14 Entities extraction of chunk 4 163/164 done, 12 nodes, 9 edges, 7984 tokens. 18:26:48 Entities extraction of chunk 6 164/164 done, 23 nodes, 19 edges, 11225 tokens. 18:26:48 Entities and relationships extraction done, 1692 nodes, 1682 edges, 1557715 tokens, 750.34s. 18:35:32 Entities merging done, 523.61s. 18:44:46 Relationships merging done, 554.13s. 18:44:46 generated subgraph for doc 0e31fb30031311f0a70a0242ac120006 in 1828.39 seconds. 18:45:02 merging subgraph for doc 0e31fb30031311f0a70a0242ac120006 into the global graph done in 15.95 seconds. 18:45:07 Identified 158185 candidate pairs 19:15:18 [ERROR][Exception]: Exceptions from Trio nursery (1 sub-exception) -- ERROR: Request timed out.

and the log from the server:

  • Exception Group Traceback (most recent call last): | File "/ragflow/rag/svr/task_executor.py", line 594, in handle_task | await do_handle_task(task) | File "/ragflow/rag/svr/task_executor.py", line 522, in do_handle_task | await run_graphrag(task, task_language, with_resolution, with_community, chat_model, embedding_model, progress_callback) | File "/ragflow/graphrag/general/index.py", line 94, in run_graphrag | await resolve_entities( | File "/ragflow/graphrag/general/index.py", line 231, in resolve_entities | reso = await er(graph, callback=callback) | File "/ragflow/graphrag/entity_resolution.py", line 101, in call | async with trio.open_nursery() as nursery: | File "/ragflow/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1058, in aexit | raise combined_error_from_nursery | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/ragflow/graphrag/entity_resolution.py", line 176, in _resolve_candidate | response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf)) | File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 447, in to_thread_run_sync | return msg_from_thread.unwrap() | File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap | raise captured_error | File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 373, in do_release_then_return_result | return result.unwrap() | File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap | raise captured_error | File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 392, in worker_fn | ret = context.run(sync_fn, *args) | File "/ragflow/graphrag/entity_resolution.py", line 176, in | response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf)) | File "/ragflow/graphrag/general/extractor.py", line 66, in _chat | raise Exception(response) | Exception: ERROR: Request timed out. +------------------------------------

The knowledge graph is generated but with out communities. The parsing process is marked as fail

vanalexgr avatar Mar 17 '25 11:03 vanalexgr

Time out to invoke LLM. export LM_TIMEOUT_SECONDS=3600

KevinHuSh avatar Mar 18 '25 05:03 KevinHuSh

Thank you! Got it working now

vanalexgr avatar Mar 18 '25 21:03 vanalexgr