ragflow [Bug]: Layout analysis in document parsing is way too slow

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

NIL

RAGFlow image version

v0.21.1 full

Other environment information

Chip: `Apple M3 Ultra ( 512 GB Memory, 8TB SSD )`

OS Type: `MacOS Tahoe 26.1 (25B78)`

Chunking Pipeline: `Laws`

Actual behavior

Layout analysis phase in document parsing is taking unusually long ( 2600+ seconds ) for what is seemingly a trivial PDF document ( Refer to attached file ).

GRSP_Project Specification Oct 2025_LMS.pdf

Here are the logs:

14:45:52 Task has been received.
14:45:53 Page(1~10): OCR started
14:47:19 Page(1~10): Processing...
14:47:59 Page(1~10): OCR finished (126.27s)
15:32:03 Page(1~10): Layout analysis (2643.50s)
15:32:03 Page(1~10): Text extraction (2643.54s)
15:32:08 Page(1~10): Generate 3 chunks
15:32:14 Page(1~10): Embedding chunks (6.14s)
15:32:14 Page(1~10): Indexing done (0.32s).
15:32:14 Page(1~10): Task done (2782.09s)

Expected behavior

It should not take as long to complete.

Steps to reproduce

1) Create a dataset with any embedding model of choice, ingestion pipeline set to `Built-in` and chunking pipeline set to `Laws`

2) Proceed to upload the attached PDF document to the given dataset and initiate parsing

3) Observe the logs. It will take tremendously long to complete parsing

Additional information

No response

Nov 26 '25 05:11 budgetdevv

Hi @budgetdevv! I'm Dosu and I’m helping the ragflow team.

The slow layout analysis you’re seeing is a known issue on ARM-based Macs like the M3 Ultra. RAGFlow’s document parsing (especially layout analysis) is heavily optimized for x86 CPUs and NVIDIA GPUs, but lacks ARM-specific optimizations—so even with powerful hardware, performance can be much worse on Mac ARM systems reference.

The layout analysis phase is handled by the DeepDoc parser, which uses a LayoutRecognizer (ONNX or Ascend) and runs a series of computationally intensive steps: layout detection, table recognition, text merging, clustering, and more. This phase is mostly single-threaded and not parallelized, so it doesn’t benefit much from extra CPU cores reference.

Here’s what you can do to speed things up:

If your PDFs are plain text and don’t need OCR or layout analysis, switch the document parser to “Naive” instead of “DeepDoc.” This skips the slowest steps and can dramatically reduce parsing time reference.
Disable non-essential features like “Use RAPTOR,” “Auto-keyword,” “Auto-question,” and knowledge graph extraction (GraphRAG) in your dataset config. These add extra LLM calls and slow down parsing reference.
The “Laws” chunking pipeline depends on detailed layout info, so it’s tightly coupled to the speed of layout analysis. If you don’t need legal-specific chunking, try a simpler chunking method.
Tuning concurrency/threading settings (like MAX_CONCURRENT_TASKS, PARALLEL_DEVICES) can help overall throughput, but won’t fix the core bottleneck if layout analysis is single-threaded.
There are no ARM-optimized Docker images or code paths, so even with tweaks, performance on Mac ARM will lag behind x86 systems.

If you need to switch parsers, make sure to delete and re-upload the PDF for the new parser to take effect reference.

If you’re stuck with ARM hardware and need DeepDoc features, you may want to try parsing on an x86 machine or cloud VM for better performance. If you have more questions or want to dig into profiling/debugging, let me know!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Nov 26 '25 05:11 dosubot[bot]

你好@budgetdevv我是Dosu，我正在帮助 ragflow 团队。

您遇到的布局分析速度慢的问题在基于 ARM 架构的 Mac（例如 M3 Ultra）上是已知的。RAGFlow 的文档解析（尤其是布局分析）针对 x86 CPU 和 NVIDIA GPU 进行了深度优化，但缺乏针对 ARM 架构的优化——因此即使拥有强大的硬件，在 Mac ARM 系统上的性能也可能差很多（参考）。

布局分析阶段由 DeepDoc 解析器处理，它使用 LayoutRecognizer（ONNX 或 Ascend）并运行一系列计算密集型步骤：布局检测、表格识别、文本合并、聚类等等。此阶段主要为单线程，未进行并行化，因此无法从额外的 CPU 核心中获益。

以下是一些可以加快速度的方法：

如果您的 PDF 文件是纯文本格式，不需要 OCR 或布局分析，请将文档解析器切换到“Naive”而不是“DeepDoc”。这样可以跳过最慢的步骤，并显著减少解析时间。

在数据集配置中禁用“使用 RAPTOR”、“自动关键字”、“自动问题”和知识图谱提取 (GraphRAG) 等非必要功能。这些功能会增加额外的 LLM 调用，并减慢解析参考的速度。

“法律”分块流程依赖于详细的布局信息，因此与布局分析的速度密切相关。如果您不需要针对特定法律的分块，请尝试使用更简单的分块方法。

调整并发/线程设置（例如MAX_CONCURRENT_TASKS，PARALLEL_DEVICES）可以帮助提高整体吞吐量，但如果布局分析是单线程的，则无法解决核心瓶颈。

目前还没有针对 ARM 优化的 Docker 镜像或代码路径，因此即使进行调整，Mac ARM 上的性能也会落后于 x86 系统。

如果需要切换解析器，请务必删除并重新上传 PDF 文件，以便新的解析器生效。

如果您使用的是 ARM 硬件，但又需要 DeepDoc 的功能，为了获得更好的性能，您可以尝试在 x86 机器或云虚拟机上进行解析。如果您有任何其他问题，或者想深入研究性能分析/调试，请告诉我！

回复时只需提及@dosu即可。

我做得怎么样？好|无关|错误|冗长|幻觉|举报 🐛 |其他

如何设置使用GPU来进行推理

Dec 01 '25 03:12 dgzxx-2000