ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: graphrag parse too slow

Open wanyuks opened this issue 10 months ago • 10 comments

Describe your problem

I try to use graphrag, but the parse is too slow, a pdf file with 2M, 68 pages parsed 3 hours and still complete. the parse args as following:

"kb_parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": [entities I defined], "method": "light", "resolution": false, "community": false}}

Is it right? The same file i use LightRAG to parse cost 1~2 hours to complete.

some logs:

Image

Image

Model factory:Xinference Model: qwen2.5-32b-q5-k-m Devices: A100 80G

wanyuks avatar Feb 18 '25 10:02 wanyuks

The speed depends on LLM to the most extend. Every chunk needs to go through LLM.

KevinHuSh avatar Feb 19 '25 04:02 KevinHuSh

Image

Image

What is RAGFLOW doing and what stage is it currently in? It has been going on for a long time.

wanyuks avatar Feb 19 '25 06:02 wanyuks

Image

Image

Image

Image

It seems to keep repeating, is this normal?

wanyuks avatar Feb 19 '25 08:02 wanyuks

Same here, my file only 1.3M, took 24h to parse and still not complete.

picassio avatar Feb 19 '25 14:02 picassio

Upgrade to nightly version of docker image.

KevinHuSh avatar Feb 20 '25 06:02 KevinHuSh

I upgrade to nightly, and parsed file success, but I can't see the knowledge graph.

Image

Image

wanyuks avatar Feb 21 '25 00:02 wanyuks

Press F12, and attach the json file here.

Image

KevinHuSh avatar Feb 21 '25 06:02 KevinHuSh

where is the generate json file?

wanyuks avatar Feb 25 '25 09:02 wanyuks

Pull the nightly again.

KevinHuSh avatar Feb 26 '25 06:02 KevinHuSh

I also have this issue with Parsing Entities on Graphs, it takes days and often never finishes.

Also found this error in the log, not sure if this is the only reason for slow processing. Still happening on 0.17 release.

[ERROR] [2025-03-05 15:09:57,685] [entity_resolution.__call__] [line:129]: error entity resolution

Traceback (most recent call last):

  File "/ragflow/graphrag/entity_resolution.py", line 118, in __call__

    response = self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf)

  File "/ragflow/graphrag/general/extractor.py", line 65, in _chat

    raise Exception(response)

Exception: **ERROR**: Error code: 400 - {'error': {'message': "Invalid 'messages[0].content': string too long. Expected a string with maximum length 1048576, but got a string with length 6968243 instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content', 'code': 'string_above_max_length'}}

Thanks.

alexff77 avatar Mar 05 '25 15:03 alexff77

Image

Image

Image

Image

It seems to keep repeating, is this normal?好像一直在重复,这正常吗?

Hello sir, it's May 7th, have you solved this problem? I tried various parameters after the source code was deployed, but I couldn't solve this problem, and my LLM server is also very powerful, but the extraction speed is still very slow

Water19C avatar May 07 '25 04:05 Water19C

It's July 15th, and this bug still persists. Tasks often get stuck at an unpredictable stage, with the logs keep flickering. Even after the task has been running for several hours, the progress remains stuck. I often choose to restart Docker, but that doesn't always work. What's more, re-parsing means having to spend a lot of tokens again.

likebeans avatar Jul 15 '25 01:07 likebeans