[Question]: What type of GPU and how many GPUs are recommended to fasten the speed of parsing files?
Describe your problem
I use CPU to parse a PDF file whose size is 55MB, but is cost about 1 hour. It's too slow to create knowledge base in ragflow. What type of GPU are recommended to use to fasten the speed of parsing files? And how many GPUs? Would RTX 4090 or RTX A6000 is recommended? Thanks.
Please deploy an embedding service with Ollama/Xinference on your GPUs. That's gona accelerate much more.
Please deploy an embedding service with Ollama/Xinference on your GPUs. That's gona accelerate much more.
I prefer vllm. So you recommend to deploy an embedding model and then add it as embedding model onto ragflow? Which embedding model is recommended?
Create a knowledge base is not time-consuming; parsing file is. Here are some tips: https://ragflow.io/docs/dev/accelerate_doc_indexing
Or, you can use docker-compose-gpu.yml to start your service. This accelerates DeepDoc tasks using GPU, requiring RAGFlow v0.16.0+.
@zengqingfu1442 Yes. The recommended embedding model is bge-m3.
Please deploy an embedding service with Ollama/Xinference on your GPUs. That's gona accelerate much more.
As shown below, this file whose size is only 1MB, is cost 13 minutes to parse it. But i find that OCR part is 3.31s, Layout analysis is 1.80s, embedding is 6.34s. But is waited about 12 minutes for the task to be received(i used the builtin BAAT/bge-large-zh-v1.5 model), i understand this because at that same time a big PDF file about 55M is being parsed on the same machine with CPU. So what i want to express is that the main spending time should not only include embedding, right?
You are right in that embedding is not the only stage that takes time. You toggled on RAPTOR, and this is another time consumer.
BTW, embedding models are for offline processing, it does not contribute to the performance of question answering, which is in real-time.
So the OCR runs on CPU by default?
Yes, you are right.
Create a knowledge base is not time-consuming; parsing file is. Here are some tips: https://ragflow.io/docs/dev/accelerate_doc_indexing
Or, you can use docker-compose-gpu.yml to start your service. This accelerates DeepDoc tasks using GPU, requiring RAGFlow v0.16.0+.
I compiled a Docker image from the source code to start. If OCR uses GPU, LIGHT=0 is required. Docker-compose-gpu.yml does indeed use GPU to accelerate ORC, but there will be OOM issues, and in the case of multiple graphics cards, only the first graphics card will be used. Can you solve this problem? Thank you.
@said-what-sakula onnxruntime-gpu leaking memory is a known issue. We've add code according to that onnxruntime issue but it doesn't help. v0.17.0 add Support the use of LLM to parse documents(experimental).
Can RAPTOR and Extracting knowledge graph (GraphRAG) also run on GPU to accelerate?
You are right in that embedding is not the only stage that takes time. You toggled on RAPTOR, and this is another time consumer.
BTW, embedding models are for offline processing, it does not contribute to the performance of question answering, which is in real-time.
Thanks @yuzhichang for sharing the story behind this. @said-what-sakula You can take a look at Feature 5 of RAGFlow's latest release notes: https://ragflow.io/docs/dev/release_notes#v0170
Can RAPTOR and Extracting knowledge graph (GraphRAG) also run on GPU to accelerate?
You are right in that embedding is not the only stage that takes time. You toggled on RAPTOR, and this is another time consumer. BTW, embedding models are for offline processing, it does not contribute to the performance of question answering, which is in real-time.
GPU can't be used to accelerate RAPTOR and knowledge graph.
Thanks @yuzhichang for sharing the story behind this. @said-what-sakula You can take a look at Feature 5 of RAGFlow's latest release notes: https://ragflow.io/docs/dev/release_notes#v0170
I didn't expect the update to be so fast. I will go check the update document and try it out
It seems that the graph in PDF file can not be parsed.
Could you use the keywords in the diagram to search for the related chunk?
Perhaps i should choose "Book" Chunk method to parse the PDF file because it is an e-book.
Perhaps i should choose "Book" Chunk method to parse the PDF file because it is an e-book.
You don't have to change chunk method. Check if the diagram is in the created chunk first.
请问您的pdf解析速度如何?使用GPU加速了吗
我使用官方代码解析pdf的时候发现也是无法使用的而且对传输 文件还有大小限制 ,请问如何去除文件大小影响,超过100的pdf扫描件就无法上传
You need to alter nginx configurations.
@KevinHuSh thanks, I tried DOC_MAXIMUM_SIZE,but it didn't work. I'm going to try your scheme
@KevinHuSh It also doesn't work
and error
2025-03-10 17:17:53,496 ERROR 16 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit.
Traceback (most recent call last):
File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/ragflow/.venv/lib/python3.10/site-packages/flask_login/utils.py", line 290, in decorated_view
return current_app.ensure_sync(func)(*args, **kwargs)
File "/ragflow/api/utils/api_utils.py", line 145, in decorated_function
input_arguments = flask_request.json or flask_request.form.to_dict()
File "/ragflow/api/apps/__init__.py", line 40, in <lambda>
Request.json = property(lambda self: self.get_json(force=True, silent=True))
File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wrappers/request.py", line 605, in get_json
data = self.get_data(cache=cache)
File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wrappers/request.py", line 419, in get_data
rv = self.stream.read()
File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/utils.py", line 107, in __get__
value = self.fget(obj) # type: ignore
File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wrappers/request.py", line 348, in stream
return get_input_stream(
File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wsgi.py", line 173, in get_input_stream
raise RequestEntityTooLarge()
werkzeug.exceptions.RequestEntityTooLarge: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit.
export MAX_CONTENT_LENGTH=100000000000
export MAX_CONTENT_LENGTH=100000000000
in .env file?
Yes.
Yes.
请问修改之后需要重新 docker compose 吗 主要是 ragflow-server 还是 ragflow-mysql ragflow-minio ,重新创建docker 会不会损失 原始做好的数据库