ragflow [Question]: graphrag parse too slow

Describe your problem

I try to use graphrag, but the parse is too slow, a pdf file with 2M, 68 pages parsed 3 hours and still complete. the parse args as following:

"kb_parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": [entities I defined], "method": "light", "resolution": false, "community": false}}

Is it right? The same file i use LightRAG to parse cost 1~2 hours to complete.

some logs:

Model factory：Xinference Model: qwen2.5-32b-q5-k-m Devices: A100 80G

Feb 18 '25 10:02 wanyuks

The speed depends on LLM to the most extend. Every chunk needs to go through LLM.

Feb 19 '25 04:02 KevinHuSh

What is RAGFLOW doing and what stage is it currently in? It has been going on for a long time.

Feb 19 '25 06:02 wanyuks

It seems to keep repeating, is this normal?

Feb 19 '25 08:02 wanyuks

Same here, my file only 1.3M, took 24h to parse and still not complete.

Feb 19 '25 14:02 picassio

Upgrade to nightly version of docker image.

Feb 20 '25 06:02 KevinHuSh

I upgrade to nightly, and parsed file success, but I can't see the knowledge graph.

Feb 21 '25 00:02 wanyuks

Press F12, and attach the json file here.

Feb 21 '25 06:02 KevinHuSh

where is the generate json file?

Feb 25 '25 09:02 wanyuks

Pull the nightly again.

Feb 26 '25 06:02 KevinHuSh

I also have this issue with Parsing Entities on Graphs, it takes days and often never finishes.

Also found this error in the log, not sure if this is the only reason for slow processing. Still happening on 0.17 release.

[ERROR] [2025-03-05 15:09:57,685] [entity_resolution.__call__] [line:129]: error entity resolution

Traceback (most recent call last):

  File "/ragflow/graphrag/entity_resolution.py", line 118, in __call__

    response = self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf)

  File "/ragflow/graphrag/general/extractor.py", line 65, in _chat

    raise Exception(response)

Exception: **ERROR**: Error code: 400 - {'error': {'message': "Invalid 'messages[0].content': string too long. Expected a string with maximum length 1048576, but got a string with length 6968243 instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content', 'code': 'string_above_max_length'}}

Thanks.

Mar 05 '25 15:03 alexff77

It seems to keep repeating, is this normal?好像一直在重复，这正常吗？

Hello sir, it's May 7th, have you solved this problem? I tried various parameters after the source code was deployed, but I couldn't solve this problem, and my LLM server is also very powerful, but the extraction speed is still very slow

May 07 '25 04:05 Water19C

It's July 15th, and this bug still persists. Tasks often get stuck at an unpredictable stage, with the logs keep flickering. Even after the task has been running for several hours, the progress remains stuck. I often choose to restart Docker, but that doesn't always work. What's more, re-parsing means having to spend a lot of tokens again.

Jul 15 '25 01:07 likebeans