PPStructureV3在内网环境无法加载本地模型
🔎 Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [x] I have searched the PaddleOCR Issues and found no similar bug report.
- [x] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
在离线环境将模型下载到了/root/.paddlex/official_models 下面,但是在初始化的时候始终报错"No available model hosting platforms detected. Please check your network"
🏃♂️ Environment (运行环境)
Linux node6 5.4.0.26-generic #30-Ubuntu x86_64
python 3.10.12
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
from paddleocr import PPStructureV3
pipeline = PPStructureV3(layout_detection_model_dir="/root/.paddlex/official_models/PP-DocBlockLayout/", table_classification_model_dir="/root/.paddlex/official_models/PP-LCNet_x1_0_table_cls/", text_detection_model_dir="/root/.paddlex/official_models/PP-OCRv5_server_det/", text_recognition_model_dir="/root/.paddlex/official_models/PP-OCRv5_server_rec")
I had a similiar problem and solved it the following way
Run the following to extract the config to a .yaml file
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
pipeline.export_paddlex_config_to_yaml("PP-StructureV3.yaml")
Open PP-StructureV3.yaml and replace each model_dir: null entry with your local path. One example:
RegionDetection:
layout_merge_bboxes_mode: small
layout_nms: true
model_dir: /pathtofolder/.paddlex/official_models/PP-DocBlockLayout
model_name: PP-DocBlockLayout
module_name: layout_detection
Then, instead of calling
pipeline = PPStructureV3(layout_detection_model_dir="/root/.paddlex/official_models/PP-DocBlockLayout/", table_classification_model_dir="/root/.paddlex/official_models/PP-LCNet_x1_0_table_cls/", text_detection_model_dir="/root/.paddlex/official_models/PP-OCRv5_server_det/", text_recognition_model_dir="/root/.paddlex/official_models/PP-OCRv5_server_rec")`
where each model is provided individually, use
pipeline = PPStructureV3(
paddlex_config="PP-StructureV3.yaml",
)
I hope this helps you. Otherwise have a look at this part of the documentation https://www.paddleocr.ai/main/en/version3.x/pipeline_usage/PP-StructureV3.html#42-model-deployment
你初始化pipeline时的参数定义的有问题,像你报错信息就是代码在加载PP-DocBlockLayout目录,而这个对应的参数应该是region_detection_model_dir,至于layout_detection_model_dir,应该指向PP-DocLayout_plus-L这个目录,其余参考它给的文档吧,14个参数都定义对就能内网加载了
@Jpzhaoo 使用最新的PaddleX(3.3.5版本及以上),如果推理所需的模型文件在/root/.paddlex/official_models/目录下已存在,那么在没有网络的情况下也可以正常推理。可以再试试。
@Jpzhaoo 使用最新的PaddleX(3.3.5版本及以上),如果推理所需的模型文件在
/root/.paddlex/official_models/目录下已存在,那么在没有网络的情况下也可以正常推理。可以再试试。
我的是3.3.5,但是依然识别不了/root/.paddlex/official_models/目录下的模型,只能全部列出来,才可以
抱歉,我又确认了下,是需要改一点代码才能支持。我已经修改提PR,将尽快合入,并于近期发出3.3.6版本。 https://github.com/PaddlePaddle/PaddleX/pull/4676
我在内网环境里面无法下载模型,/root/.paddlex/official_models目录下是空的,应该怎么处理?从哪能拉到这些文件?
抱歉,我又确认了下,是需要改一点代码才能支持。我已经修改提PR,将尽快合入,并于近期发出3.3.6版本。 PaddlePaddle/PaddleX#4676
我也碰到这个问题了,如果需要申请外网的话,需要申请什么域名?因为我们这边只能按照域名来申请
@yuexingliang 有两个方式:
- 手动提前下载好模型,具体模型的下载方式,可以通过以下模型托管平台获得huggingface、aistudio、modelscope;
- 如果是要申请域名,可以添加以下域名前缀(或其中任意一个):
huggingface.co、aistudio.baidu.com、modelscope.cn、paddle-model-ecology.bj.bcebos.com。