PaddleNLP
PaddleNLP copied to clipboard
python ocr_process.py Fatal Python error: Segmentation fault
欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:
- 版本、环境信息 1)PaddleNLP和PaddlePaddle版本:请提供您的PaddleNLP和PaddlePaddle版本号,例如PaddleNLP 2.0.4,PaddlePaddle2.1.1 2)系统环境:请您描述系统类型,例如Linux/Windows/MacOS/,python版本
- 复现信息:如为报错,请给出复现环境、复现步骤
如还有问题可以到 PaddleNLP github 主页面的**社区交流**扫描加入微信群,相关值班同学将会为您解答!
下面是我的paddle环境 paddlenlp 2.3.4 paddleocr 2.5 paddlepaddle-gpu 2.3.1 操作系统是linux Python 3.8.13 我执行的是https://aistudio.baidu.com/aistudio/projectdetail/4049663?channelType=0&channel=0汽车说明书的4OCR模块 python ocr_process.py
[2022-08-02 11:25:01,779] [ INFO] - Already cached /xxx/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model [2022-08-02 11:25:02,423] [ INFO] - tokenizer config file saved in /xxx/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json [2022-08-02 11:25:02,424] [ INFO] - Special tokens file saved in /xxx/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json [2022/08/02 11:25:02] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/xxx/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='/xxx/.paddleocr/whl/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv2', output='./output', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/xxx/.paddleocr/whl/rec/ch/ch_PP-OCRv2_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='STRUCTURE', table=True, table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False) Fatal Python error: Segmentation fault
Current thread 0x00007f31fff2a740 (most recent call first):
File "/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/tools/infer/predict_det.py", line 216 in call
File "/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/tools/infer/predict_system.py", line 69 in call
File "/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 383 in ocr
File "ocr_process.py", line 263 in ocr_preprocess
File "ocr_process.py", line 275 in
我们在Aistudio上直接运行python ocr_process.py并没有复现出这个错误。
您这边是在aistudio上直接运行的吗?是的话,可以共享下aistudio链接,我们来看看。或者留下联系方式沟通一下。
安装环境是: !pip install yacs !pip install paddlenlp==2.3.4 !pip install paddleocr==2.5
我在aistudio中跑不通,在执行排序模块Rerank的时候,报错如下错误: ValueError: Operator "gen_nccl_id" has not been registered. 完整的流程如下: https://aistudio.baidu.com/aistudio/projectdetail/4373782?contributionType=1
同样的安装配置我在google colab中使用GPU跑通了,aistudio目前是CPU环境,可能是这个原因。
所以我后来打算在公司的GPU服务器上部署运行这套代码,结果运行python ocr_process.py报错了,有可能是服务器环境的问题,我还在排查,排序模型和跨模态模型是可以训练和测试的。
目前在CPU上会遇到如上环境兼容问题, 建议在AiStudio上使用GPU环境运行。
如果在公司GPU服务器上运行:
- 确认一下paddleocr是否能够在当前环境正常运行
- 尝试初始化一个新的python环境部署,运行python ocr_process.py
- 下载使用github上最新的ocr_process.py脚本:https://github.com/1649759610/PaddleNLP/blob/hf_datasets/applications/doc_vqa/OCR_process/ocr_process.py
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。