PaddleOCR paddleocr-vl 进行ocr识别的时候，很多图片都没有识别，而是只把图片提取出来？

🔎 Search before asking

[x] I have searched the PaddleOCR Docs and found no similar bug report.
[x] I have searched the PaddleOCR Issues and found no similar bug report.
[x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

paddleocr-vl 执行pipeline.predict的时候，有些时候，图片不进行ocr的自动识别，是什么原因？识别后发现大量的小图？都没有自动识别，我还需要再次调用然后识别文字，最后在替换到指定位置吗？seal相关的图片，这个我看到了，都没有识别。seal识别的效果也不太好。seal的问题我后面单独再发issue。先帮我看看，为什么图片识别不出来呢？

🏃‍♂️ Environment (运行环境)

https://aistudio.baidu.com/paddleocr/task 直接使用官方的链接同样可以复现。我是通过paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 --backend_config 'vllm_config.json' 部署在租的云平台上。运行24G的GPU

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

https://aistudio.baidu.com/paddleocr/task 直接上传图片即可

Nov 25 '25 11:11 johnny20240812

第二张图是原图，第一张图，是从原图中分离出来的子图。有没有什么配置项，是可以识别出印章图片以外的所有图片文字

Nov 25 '25 11:11 johnny20240812

最小复现------------------------------------------------------------------------------------------------ 客户端代码： ocr_model = PaddleOCRVL( paddlex_config="PaddleOCR-VL.yaml" ) output = ocr_model.predict(task_all_data[0],use_queues=False) for res in output: res.save_to_markdown(save_path=output_dir)

输出的结果：xxx.md格式。内容如下：

从结果上看，似乎只是找到了有几张图，没有对图片内容进行识别和转换？但是，很多其他的ocr都识别了

我的PaddleOCR-VL.yaml配置如下：

pipeline_name: PaddleOCR-VL

batch_size: 64

use_queues: True

use_doc_preprocessor: False use_layout_detection: True use_chart_recognition: False format_block_content: False

SubModules: LayoutDetection: module_name: layout_detection model_name: PP-DocLayoutV2 model_dir: null batch_size: 16 threshold: 0: 0.5 # abstract 1: 0.5 # algorithm 2: 0.5 # aside_text 3: 0.5 # chart 4: 0.5 # content 5: 0.4 # formula 6: 0.4 # doc_title 7: 0.5 # figure_title 8: 0.5 # footer 9: 0.5 # footer 10: 0.5 # footnote 11: 0.5 # formula_number 12: 0.5 # header 13: 0.5 # header 14: 0.5 # image 15: 0.4 # formula 16: 0.5 # number 17: 0.4 # paragraph_title 18: 0.5 # reference 19: 0.5 # reference_content 20: 0.45 # seal 21: 0.5 # table 22: 0.4 # text 23: 0.4 # text 24: 0.5 # vision_footnote layout_nms: True layout_unclip_ratio: [1.0, 1.0] layout_merge_bboxes_mode: 0: "union" # abstract 1: "union" # algorithm 2: "union" # aside_text 3: "large" # chart 4: "union" # content 5: "large" # display_formula 6: "large" # doc_title 7: "union" # figure_title 8: "union" # footer 9: "union" # footer 10: "union" # footnote 11: "union" # formula_number 12: "union" # header 13: "union" # header 14: "union" # image 15: "large" # inline_formula 16: "union" # number 17: "large" # paragraph_title 18: "union" # reference 19: "union" # reference_content 20: "union" # seal 21: "union" # table 22: "union" # text 23: "union" # text 24: "union" # vision_footnote VLRecognition: module_name: vl_recognition model_name: PaddleOCR-VL-0.9B model_dir: null batch_size: 2048 genai_config: backend: vllm-server server_url: http://127.0.0.1:8118/v1 timeout: 300 max_retries: 3

SubPipelines: DocPreprocessor: pipeline_name: doc_preprocessor batch_size: 8 use_doc_orientation_classify: True use_doc_unwarping: True SubModules: DocOrientationClassify: module_name: doc_text_orientation model_name: PP-LCNet_x1_0_doc_ori model_dir: null batch_size: 8 DocUnwarping: module_name: image_unwarping model_name: UVDoc model_dir: null Serving: extra: max_num_input_imgs: null

因为，是要直接对Pdf进行识别，为了最大化的利用GPU调整了一些参数，不知道，是不是某些参数设置的不对？我看到官方网站的演示例子中是，把ocr的文字直接识别出来了，想请教下，是哪个参数，控制直接识别文字的？

我的vllm配置文件如下： max_num_seqs 512 max_num_batched_tokens 32768 swap_space 4 gpu-memory-utilization 0.7

因为我的文件内容通常都比较多。有的800+。所以我把大的pdf按照400页，拆分成了很多个小的pdf。然后分别进行识别。为了防止运行中报错，我设置了：use_queues=False

云平台配置如下： cuda128_torch280_py312 RTX4090 24G显存。

Nov 25 '25 12:11 johnny20240812

paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 --backend_config 'vllm_config.json' 我是使用这种方式，启动的vllm服务

Nov 25 '25 12:11 johnny20240812

我的天，我仔细的看了官方的文档，我启动vllm服务的时候，用的是paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 --backend_config 'vllm_config.json' 。但是，官方的代码是： paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --backend_config vllm_config.yaml 难道是这个错误，导致的，只识别结构，而不识别内容？

Nov 25 '25 12:11 johnny20240812

我换了--model_name PaddleOCR-VL-0.9B --backend vllm --port 8118 --backend_config 'vllm_config.json' 。这个命令。得到的效果是一样的。

Nov 25 '25 13:11 johnny20240812

我通过程序解决了，先把这个给关闭把，官方的api应该也是通过代码判断实现的。感谢

Dec 01 '25 01:12 johnny20240812

paddleocr-vl 进行ocr识别的时候，很多图片都没有识别，而是只把图片提取出来？

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

从结果上看，似乎只是找到了有几张图，没有对图片内容进行识别和转换？但是，很多其他的ocr都识别了

因为，是要直接对Pdf进行识别，为了最大化的利用GPU调整了一些参数，不知道，是不是某些参数设置的不对？ 我看到官方网站的演示例子中是，把ocr的文字直接识别出来了，想请教下，是哪个参数，控制直接识别文字的？

我的vllm配置文件如下： max_num_seqs 512 max_num_batched_tokens 32768 swap_space 4 gpu-memory-utilization 0.7

因为我的文件内容通常都比较多。有的800+。所以我把大的pdf按照400页，拆分成了很多个小的pdf。然后分别进行识别。 为了防止运行中报错，我设置了：use_queues=False

因为，是要直接对Pdf进行识别，为了最大化的利用GPU调整了一些参数，不知道，是不是某些参数设置的不对？我看到官方网站的演示例子中是，把ocr的文字直接识别出来了，想请教下，是哪个参数，控制直接识别文字的？

因为我的文件内容通常都比较多。有的800+。所以我把大的pdf按照400页，拆分成了很多个小的pdf。然后分别进行识别。为了防止运行中报错，我设置了：use_queues=False