ragflow [Question]: What type of GPU and how many GPUs are recommended to fasten the speed of parsing files?

Describe your problem

I use CPU to parse a PDF file whose size is 55MB, but is cost about 1 hour. It's too slow to create knowledge base in ragflow. What type of GPU are recommended to use to fasten the speed of parsing files? And how many GPUs? Would RTX 4090 or RTX A6000 is recommended? Thanks.

Mar 04 '25 07:03 zengqingfu1442

Please deploy an embedding service with Ollama/Xinference on your GPUs. That's gona accelerate much more.

Mar 05 '25 02:03 KevinHuSh

Please deploy an embedding service with Ollama/Xinference on your GPUs. That's gona accelerate much more.

I prefer vllm. So you recommend to deploy an embedding model and then add it as embedding model onto ragflow? Which embedding model is recommended?

Mar 05 '25 03:03 zengqingfu1442

Create a knowledge base is not time-consuming; parsing file is. Here are some tips: https://ragflow.io/docs/dev/accelerate_doc_indexing

Or, you can use docker-compose-gpu.yml to start your service. This accelerates DeepDoc tasks using GPU, requiring RAGFlow v0.16.0+.

Mar 05 '25 03:03 writinwaters

@zengqingfu1442 Yes. The recommended embedding model is bge-m3.

Mar 05 '25 03:03 yuzhichang

Please deploy an embedding service with Ollama/Xinference on your GPUs. That's gona accelerate much more.

As shown below, this file whose size is only 1MB, is cost 13 minutes to parse it. But i find that OCR part is 3.31s, Layout analysis is 1.80s, embedding is 6.34s. But is waited about 12 minutes for the task to be received(i used the builtin BAAT/bge-large-zh-v1.5 model), i understand this because at that same time a big PDF file about 55M is being parsed on the same machine with CPU. So what i want to express is that the main spending time should not only include embedding, right?

Mar 05 '25 03:03 zengqingfu1442

You are right in that embedding is not the only stage that takes time. You toggled on RAPTOR, and this is another time consumer.

BTW, embedding models are for offline processing, it does not contribute to the performance of question answering, which is in real-time.

Mar 05 '25 03:03 writinwaters

So the OCR runs on CPU by default?

Mar 05 '25 03:03 zengqingfu1442

Yes, you are right.

Mar 05 '25 03:03 writinwaters

Create a knowledge base is not time-consuming; parsing file is. Here are some tips: https://ragflow.io/docs/dev/accelerate_doc_indexing

Or, you can use docker-compose-gpu.yml to start your service. This accelerates DeepDoc tasks using GPU, requiring RAGFlow v0.16.0+.

I compiled a Docker image from the source code to start. If OCR uses GPU, LIGHT=0 is required. Docker-compose-gpu.yml does indeed use GPU to accelerate ORC, but there will be OOM issues, and in the case of multiple graphics cards, only the first graphics card will be used. Can you solve this problem? Thank you.

Mar 05 '25 06:03 said-what-sakula

@said-what-sakula onnxruntime-gpu leaking memory is a known issue. We've add code according to that onnxruntime issue but it doesn't help. v0.17.0 add Support the use of LLM to parse documents(experimental).

Mar 05 '25 06:03 yuzhichang

Can RAPTOR and Extracting knowledge graph (GraphRAG) also run on GPU to accelerate?

You are right in that embedding is not the only stage that takes time. You toggled on RAPTOR, and this is another time consumer.

BTW, embedding models are for offline processing, it does not contribute to the performance of question answering, which is in real-time.

Mar 05 '25 06:03 zengqingfu1442

Thanks @yuzhichang for sharing the story behind this. @said-what-sakula You can take a look at Feature 5 of RAGFlow's latest release notes: https://ragflow.io/docs/dev/release_notes#v0170

Mar 05 '25 06:03 writinwaters

Can RAPTOR and Extracting knowledge graph (GraphRAG) also run on GPU to accelerate?

You are right in that embedding is not the only stage that takes time. You toggled on RAPTOR, and this is another time consumer. BTW, embedding models are for offline processing, it does not contribute to the performance of question answering, which is in real-time.

GPU can't be used to accelerate RAPTOR and knowledge graph.

Mar 05 '25 06:03 writinwaters

Thanks @yuzhichang for sharing the story behind this. @said-what-sakula You can take a look at Feature 5 of RAGFlow's latest release notes: https://ragflow.io/docs/dev/release_notes#v0170

I didn't expect the update to be so fast. I will go check the update document and try it out

Mar 05 '25 06:03 said-what-sakula

It seems that the graph in PDF file can not be parsed.

Mar 05 '25 11:03 zengqingfu1442

Could you use the keywords in the diagram to search for the related chunk?

Mar 06 '25 08:03 writinwaters

Perhaps i should choose "Book" Chunk method to parse the PDF file because it is an e-book.

Mar 06 '25 09:03 zengqingfu1442

Perhaps i should choose "Book" Chunk method to parse the PDF file because it is an e-book.

You don't have to change chunk method. Check if the diagram is in the created chunk first.

Mar 06 '25 09:03 writinwaters

请问您的pdf解析速度如何？使用GPU加速了吗

Mar 06 '25 13:03 Alisehen

我使用官方代码解析pdf的时候发现也是无法使用的而且对传输文件还有大小限制，请问如何去除文件大小影响，超过100的pdf扫描件就无法上传

Mar 10 '25 06:03 tongchangD

You need to alter nginx configurations.

Mar 10 '25 07:03 KevinHuSh

@KevinHuSh thanks, I tried DOC_MAXIMUM_SIZE,but it didn't work. I'm going to try your scheme

Mar 10 '25 08:03 tongchangD

@KevinHuSh It also doesn't work

and error

2025-03-10 17:17:53,496 ERROR    16 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit.
Traceback (most recent call last):
  File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/ragflow/.venv/lib/python3.10/site-packages/flask_login/utils.py", line 290, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
  File "/ragflow/api/utils/api_utils.py", line 145, in decorated_function
    input_arguments = flask_request.json or flask_request.form.to_dict()
  File "/ragflow/api/apps/__init__.py", line 40, in <lambda>
    Request.json = property(lambda self: self.get_json(force=True, silent=True))
  File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wrappers/request.py", line 605, in get_json
    data = self.get_data(cache=cache)
  File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wrappers/request.py", line 419, in get_data
    rv = self.stream.read()
  File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/utils.py", line 107, in __get__
    value = self.fget(obj)  # type: ignore
  File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wrappers/request.py", line 348, in stream
    return get_input_stream(
  File "/ragflow/.venv/lib/python3.10/site-packages/werkzeug/wsgi.py", line 173, in get_input_stream
    raise RequestEntityTooLarge()
werkzeug.exceptions.RequestEntityTooLarge: 413 Request Entity Too Large: The data value transmitted exceeds the capacity limit.

Mar 10 '25 09:03 tongchangD

export MAX_CONTENT_LENGTH=100000000000

Mar 11 '25 05:03 KevinHuSh

export MAX_CONTENT_LENGTH=100000000000

in .env file?

Mar 11 '25 08:03 zengqingfu1442

Yes.

Mar 12 '25 04:03 KevinHuSh

Yes.

请问修改之后需要重新 docker compose 吗主要是 ragflow-server 还是 ragflow-mysql ragflow-minio ，重新创建docker 会不会损失原始做好的数据库

Mar 14 '25 03:03 tongchangD