PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

python ocr_process.py Fatal Python error: Segmentation fault

Open wshzd opened this issue 2 years ago • 3 comments

欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:

  • 版本、环境信息 1)PaddleNLP和PaddlePaddle版本:请提供您的PaddleNLP和PaddlePaddle版本号,例如PaddleNLP 2.0.4,PaddlePaddle2.1.1 2)系统环境:请您描述系统类型,例如Linux/Windows/MacOS/,python版本
  • 复现信息:如为报错,请给出复现环境、复现步骤

如还有问题可以到 PaddleNLP github 主页面的**社区交流**扫描加入微信群,相关值班同学将会为您解答!

下面是我的paddle环境 paddlenlp 2.3.4 paddleocr 2.5 paddlepaddle-gpu 2.3.1 操作系统是linux Python 3.8.13 我执行的是https://aistudio.baidu.com/aistudio/projectdetail/4049663?channelType=0&channel=0汽车说明书的4OCR模块 python ocr_process.py

[2022-08-02 11:25:01,779] [ INFO] - Already cached /xxx/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model [2022-08-02 11:25:02,423] [ INFO] - tokenizer config file saved in /xxx/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json [2022-08-02 11:25:02,424] [ INFO] - Special tokens file saved in /xxx/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json [2022/08/02 11:25:02] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/xxx/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='/xxx/.paddleocr/whl/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv2', output='./output', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/xxx/.paddleocr/whl/rec/ch/ch_PP-OCRv2_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='STRUCTURE', table=True, table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False) Fatal Python error: Segmentation fault

Current thread 0x00007f31fff2a740 (most recent call first): File "/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/tools/infer/predict_det.py", line 216 in call File "/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/tools/infer/predict_system.py", line 69 in call File "/xxx/anaconda3/envs/paddle_car_manual/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 383 in ocr File "ocr_process.py", line 263 in ocr_preprocess File "ocr_process.py", line 275 in Segmentation fault

wshzd avatar Aug 02 '22 03:08 wshzd

我们在Aistudio上直接运行python ocr_process.py并没有复现出这个错误。

您这边是在aistudio上直接运行的吗?是的话,可以共享下aistudio链接,我们来看看。或者留下联系方式沟通一下。

1649759610 avatar Aug 02 '22 06:08 1649759610

安装环境是: !pip install yacs !pip install paddlenlp==2.3.4 !pip install paddleocr==2.5

我在aistudio中跑不通,在执行排序模块Rerank的时候,报错如下错误: ValueError: Operator "gen_nccl_id" has not been registered. 完整的流程如下: https://aistudio.baidu.com/aistudio/projectdetail/4373782?contributionType=1

同样的安装配置我在google colab中使用GPU跑通了,aistudio目前是CPU环境,可能是这个原因。

所以我后来打算在公司的GPU服务器上部署运行这套代码,结果运行python ocr_process.py报错了,有可能是服务器环境的问题,我还在排查,排序模型和跨模态模型是可以训练和测试的。

wshzd avatar Aug 02 '22 09:08 wshzd

目前在CPU上会遇到如上环境兼容问题, 建议在AiStudio上使用GPU环境运行。

如果在公司GPU服务器上运行:

  1. 确认一下paddleocr是否能够在当前环境正常运行
  2. 尝试初始化一个新的python环境部署,运行python ocr_process.py
  3. 下载使用github上最新的ocr_process.py脚本:https://github.com/1649759610/PaddleNLP/blob/hf_datasets/applications/doc_vqa/OCR_process/ocr_process.py

1649759610 avatar Aug 02 '22 13:08 1649759610

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 08 '22 02:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Dec 22 '22 16:12 github-actions[bot]