PaddleOCR
PaddleOCR copied to clipboard
Invalid init param "vl_rec_max_concurrency" for PaddleOCRVL
🔎 Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [x] I have searched the PaddleOCR Issues and found no similar bug report.
- [x] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
构造PaddleOCRVL实例时,参数vl_rec_max_concurrency无效,当输入长 pdf 时并发能来到 200,把我的转发服务干掉线了。
定位到问题在/home/xxx/miniconda3/envs/paddlevllm/lib/python3.13/site-packages/paddleocr/_pipelines/paddleocr_vl.py的_get_paddlex_config_overrides方法的STRUCTURE变量。变量中缺失了该参数对应的配置代码。
添加下面代码可以修正该问题:
"SubModules.VLRecognition.genai_config.max_concurrency": self._params[
"vl_rec_max_concurrency"
],
🏃♂️ Environment (运行环境)
OS Ubuntu 24.04 LTS
python 3.13
paddleocr 3.3.1
paddlepaddle-gpu 3.2.0
paddlex 3.3.8
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
from pathlib import Path
from paddleocr import PaddleOCRVL
input_file = "./mamba.pdf"
output_path = Path("./output")
pipeline = PaddleOCRVL(use_layout_detection=True,
vl_rec_max_concurrency=1, # key parameter
vl_rec_backend="vllm-server",
vl_rec_server_url="http://192.168.1.7:8082/v1")
output = pipeline.predict(input=input_file)
print("page count", len(output))
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)