[Question]: graphrag parse too slow
Describe your problem
I try to use graphrag, but the parse is too slow, a pdf file with 2M, 68 pages parsed 3 hours and still complete. the parse args as following:
"kb_parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": [entities I defined], "method": "light", "resolution": false, "community": false}}
Is it right? The same file i use LightRAG to parse cost 1~2 hours to complete.
some logs:
Model factory:Xinference Model: qwen2.5-32b-q5-k-m Devices: A100 80G
The speed depends on LLM to the most extend. Every chunk needs to go through LLM.
What is RAGFLOW doing and what stage is it currently in? It has been going on for a long time.
It seems to keep repeating, is this normal?
Same here, my file only 1.3M, took 24h to parse and still not complete.
Upgrade to nightly version of docker image.
I upgrade to nightly, and parsed file success, but I can't see the knowledge graph.
Press F12, and attach the json file here.
where is the generate json file?
Pull the nightly again.
I also have this issue with Parsing Entities on Graphs, it takes days and often never finishes.
Also found this error in the log, not sure if this is the only reason for slow processing. Still happening on 0.17 release.
[ERROR] [2025-03-05 15:09:57,685] [entity_resolution.__call__] [line:129]: error entity resolution
Traceback (most recent call last):
File "/ragflow/graphrag/entity_resolution.py", line 118, in __call__
response = self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf)
File "/ragflow/graphrag/general/extractor.py", line 65, in _chat
raise Exception(response)
Exception: **ERROR**: Error code: 400 - {'error': {'message': "Invalid 'messages[0].content': string too long. Expected a string with maximum length 1048576, but got a string with length 6968243 instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content', 'code': 'string_above_max_length'}}
Thanks.
It seems to keep repeating, is this normal?好像一直在重复,这正常吗?
Hello sir, it's May 7th, have you solved this problem? I tried various parameters after the source code was deployed, but I couldn't solve this problem, and my LLM server is also very powerful, but the extraction speed is still very slow
It's July 15th, and this bug still persists. Tasks often get stuck at an unpredictable stage, with the logs keep flickering. Even after the task has been running for several hours, the progress remains stuck. I often choose to restart Docker, but that doesn't always work. What's more, re-parsing means having to spend a lot of tokens again.