ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: How to improve document parsing speed through GPU

Open said-what-sakula opened this issue 9 months ago • 11 comments

Describe your problem

I know there is a memory leak issue with onnxruntime. Is there any other way to improve document parsing speed through GPU? Deepdoc consumes too much CPU resources, and I can already parse one file with 128 cores at almost 100%

said-what-sakula avatar Mar 06 '25 08:03 said-what-sakula

You can refer to the similar question that has been Close. 4460

charmmy-workstation avatar Mar 06 '25 08:03 charmmy-workstation

You can refer to the similar question that has been Close.您可以参考已关闭的类似问题。 4460

I have set the WS to 32, but the key issue is that the CPU usage in the OCR step is too high and the required time is too long

said-what-sakula avatar Mar 06 '25 08:03 said-what-sakula

You can refer to the similar question that has been Close.您可以参考已关闭的类似问题。 4460

I have set the WS to 32, but the key issue is that the CPU usage in the OCR step is too high and the required time is too long

Image

Image

Image

said-what-sakula avatar Mar 06 '25 08:03 said-what-sakula

What about starting multiple task executors?

KevinHuSh avatar Mar 07 '25 04:03 KevinHuSh

What about starting multiple task executors?关于启动多个任务执行器怎么样?

Image我调整了WS到32,问题是当docker-compose-gpu.yml启动执行deepdoc就会oom

said-what-sakula avatar Mar 07 '25 05:03 said-what-sakula

Do not utilize GPUs for RAGFlow server. You could deploy an embedding inference server on GPUs which will accelerate chunking procedure much more.

KevinHuSh avatar Mar 07 '25 08:03 KevinHuSh

Do not utilize GPUs for RAGFlow server. You could deploy an embedding inference server on GPUs which will accelerate chunking procedure much more.不要在 RAGFlow 服务器上使用 GPU。您可以在 GPU 上部署一个嵌入推理服务器,这将大大加速分块过程。

ImageIn fact, the OCR process is the main time consuming process

said-what-sakula avatar Mar 07 '25 08:03 said-what-sakula

Do not utilize GPUs for RAGFlow server. You could deploy an embedding inference server on GPUs which will accelerate chunking procedure much more.不要在 RAGFlow 服务器上使用 GPU。您可以在 GPU 上部署一个嵌入推理服务器,这将大大加速分块过程。

ImageIn fact, the OCR process is the main time consuming process

I consume a lot of CPU resources every time I run document parsing to the OCR step, and the parsing time is very long, so I want to use GPU to accelerate the OCR process

Image

Image As shown in my picture, OCR recognition of a file takes up 64 core CPUs

said-what-sakula avatar Mar 07 '25 08:03 said-what-sakula

I have the same issue. It seems OCR does not use GPUs. How to improve it?

wertyac avatar Mar 19 '25 00:03 wertyac

@KevinHuSh 您好,我把这个变量设置为MAX_CONCURRENT_CHUNK_BUILDERS = int(os.environ.get('MAX_CONCURRENT_CHUNK_BUILDERS', "5"))后,我发现在解析pdf文件时,显存会激增,初始化显存为400多兆,解析某个pdf的过程中GPU显存会激增到17600M左右,该文件解析结束后显存回归到700M左右,请问下显存激增是什么原因呢,有没有办法缓解?

Danee-wawawa avatar Jun 11 '25 12:06 Danee-wawawa

Do not utilize GPUs for RAGFlow server. You could deploy an embedding inference server on GPUs which will accelerate chunking procedure much more.不要在RAGFlow 服务器上使用GPU。您可以在GPU 上部署一个嵌入推理服务器,这将大大加速分块过程。

Image事实上,OCR 过程是主要耗时的过程

每次运行文档解析到OCR步骤时都会消耗大量的CPU资源,而且解析时间很长,所以我想使用GPU来加速OCR过程

Image

Image 如我的图所示,一个文件的OCR识别占用了64个核心CPU

我也有这个问题,请问下解决了吗

annian101 avatar Jun 19 '25 01:06 annian101

I recommand to apply slim version of docker image and not to deploy RAGFlow with GPU which is more feasible for embedding/LLM inference.

KevinHuSh avatar Jun 20 '25 04:06 KevinHuSh