PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

运行PaddleNLP/applications/document_intelligence/doc_vqa/中的汽车说明书跨模态智能问答出现问题

Open xdnjust opened this issue 3 years ago • 16 comments

请提出你的问题

我在运行OCR检测时(代码位置:PaddleNLP/applications/document_intelligence/doc_vqa/OCR_process/ocr_process.py)出现以下问题,请问是什么原因?

[2022-10-26 16:30:36,060] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model [2022-10-26 16:30:36,584] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json [2022-10-26 16:30:36,584] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json [2022/10/26 16:30:36] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_box_type='quad', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/root/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, image_orientation=False, ir_optim=True, kie_algorithm='LayoutXLM', label_list=['0', '180'], lang='ch', layout=True, layout_dict_path=None, layout_model_dir=None, layout_nms_threshold=0.5, layout_score_threshold=0.5, max_batch_size=10, max_text_length=25, merge_no_span_structure=True, min_subgraph_size=15, mode='structure', ocr=True, ocr_order_method=None, ocr_version='PP-OCRv3', output='./output', page_num=0, precision='fp32', process_id=0, re_model_dir=None, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_model_dir='/root/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer', recovery=False, save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ser_model_dir=None, show_log=True, sr_batch_num=1, sr_image_shape='3, 32, 128', sr_model_dir=None, structure_version='PP-Structurev2', table=True, table_algorithm='TableAttn', table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_npu=False, use_onnx=False, use_pdf2docx_api=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_visual_backbone=True, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False) Traceback (most recent call last): File "ocr_process.py", line 287, in ocr_results = ocr_preprocess(img_dir) File "ocr_process.py", line 275, in ocr_preprocess parsing_res = ocr.ocr(img_path, cls=True) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/paddleocr.py", line 534, in ocr dt_boxes, rec_res, _ = self.call(img, cls) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/tools/infer/predict_system.py", line 71, in call dt_boxes, elapse = self.text_detector(img) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/tools/infer/predict_det.py", line 242, in call self.input_tensor.copy_from_cpu(img) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddle/fluid/inference/wrapper.py", line 36, in tensor_copy_from_cpu self.copy_from_cpu_bind(data) OSError: (External) CUDNN error(1), CUDNN_STATUS_NOT_INITIALIZED. [Hint: 'CUDNN_STATUS_NOT_INITIALIZED'. The cuDNN library was not initialized properly. This error is usually returned when a call to cudnnCreate() fails or when cudnnCreate() has not been called prior to calling another cuDNN routine. In the former case, it is usually due to an error in the CUDA Runtime API called by cudnnCreate() or by an error in the hardware setup. ] (at /paddle/paddle/phi/backends/gpu/gpu_context.cc:516)

xdnjust avatar Oct 26 '22 08:10 xdnjust

看起来是paddlepaddle-gpu安装有问题

wawltor avatar Oct 26 '22 09:10 wawltor

import paddle 
paddle.utils.run_check() 

你看看是否能跑通,并麻烦截图发到issue

wawltor avatar Oct 26 '22 09:10 wawltor

import paddle 
paddle.utils.run_check() 

你看看是否能跑通,并麻烦截图发到issue

image

xdnjust avatar Oct 26 '22 09:10 xdnjust

你好,上面的截图是paddlepaddle-gpu==2.3.0的结果 我现在更新到2.3.2版本,执行上面的命令,出现如下截图

image

xdnjust avatar Oct 26 '22 09:10 xdnjust

看输出,感觉现在单卡应该可以正常运行了。

JunnYu avatar Oct 26 '22 11:10 JunnYu

还是不行,出现如下错误:

Traceback (most recent call last): File "ocr_process.py", line 287, in ocr_results = ocr_preprocess(img_dir) File "ocr_process.py", line 275, in ocr_preprocess parsing_res = ocr.ocr(img_path, cls=True) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/paddleocr.py", line 534, in ocr dt_boxes, rec_res, _ = self.call(img, cls) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/tools/infer/predict_system.py", line 71, in call dt_boxes, elapse = self.text_detector(img) File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddleocr/tools/infer/predict_det.py", line 243, in call self.predictor.run() OSError: In user code:

File "tools/export_model.py", line 172, in <module>
  main()
File "tools/export_model.py", line 165, in main
  sub_model_save_path, logger)
File "tools/export_model.py", line 99, in export_single_model
  paddle.jit.save(model, save_path)
File "<decorator-gen-101>", line 2, in save

File "/usr/local/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
  return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
  return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 744, in save
  inner_input_spec)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 517, in concrete_program_specify_input_spec
  *desired_input_spec)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 427, in get_concrete_program
  concrete_program, partial_program_layer = self._program_cache[cache_key]
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 723, in __getitem__
  self._caches[item] = self._build_once(item)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 714, in _build_once
  **cache_key.kwargs)
File "<decorator-gen-99>", line 2, in from_func_spec

File "/usr/local/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
  return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
  return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 662, in from_func_spec
  outputs = static_func(*inputs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 79, in forward
  x = self.backbone(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
  return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
  outputs = self.forward(*inputs, **kwargs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 146, in forward
  x = self.conv(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
  return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
  outputs = self.forward(*inputs, **kwargs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 179, in forward
  x = self.conv(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
  return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
  outputs = self.forward(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 677, in forward
  use_cudnn=self._use_cudnn)
File "/usr/local/lib/python3.7/site-packages/paddle/nn/functional/conv.py", line 148, in _conv_nd
  type=op_type, inputs=inputs, outputs=outputs, attrs=attrs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
  return self.main_program.current_block().append_op(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3184, in append_op
  attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2224, in __init__
  for frame in traceback.extract_stack():

ExternalError: CUDNN error(1), CUDNN_STATUS_NOT_INITIALIZED.
  [Hint: 'CUDNN_STATUS_NOT_INITIALIZED'.  The cuDNN library was not initialized properly. This error is usually returned when a call to cudnnCreate() fails or when cudnnCreate() has not been called prior to calling another cuDNN routine. In the former case, it is usually due to an error in the CUDA Runtime API called by cudnnCreate() or by an error in the hardware setup.  ] (at /paddle/paddle/phi/backends/gpu/gpu_resources.cc:211)
  [operator < conv2d_fusion > error]

xdnjust avatar Oct 26 '22 11:10 xdnjust

建议查看一下系统的cuda版本,cudnn版本,cuda driver版本,当前主要是paddle环境没有安装正确,导致的报错。 可以查看一下对应的文档 https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html

JunnYu avatar Oct 26 '22 12:10 JunnYu

看了下cuda版本为10.2.89,应该对应paddlepaddle-gpu==2.3.2版本吧,不知道为啥安装之后还是有上面的问题

xdnjust avatar Oct 26 '22 12:10 xdnjust

可能还是需要再确定一下环境的安装问题,有问题可以再交流

JunnYu avatar Oct 27 '22 11:10 JunnYu

这个是不是代表已经安装成功了?

image

xdnjust avatar Oct 28 '22 09:10 xdnjust

跑了下PaddleNLP/applications/document_intelligence/doc_vqa/下面的三个模块,发现OCR处理模块可以跑通输出结果。 但是Rerank模块的训练部分无法运行,请帮忙确认一下代码是否可正确运行?我这边一直跑不通

File "./src/train_ce.py", line 392, in <module>
  main(args)
File "./src/train_ce.py", line 146, in main
  ernie_config=ernie_config)
File "/users_2/d00477216/docvqa/Rerank/src/cross_encoder.py", line 112, in create_model
  graph_vars = _model(is_noise=True)
File "/users_2/d00477216/docvqa/Rerank/src/cross_encoder.py", line 65, in _model
  is_noise=is_noise)
File "/users_2/d00477216/docvqa/Rerank/src/model/ernie.py", line 105, in __init__
  task_ids, input_mask)
File "/users_2/d00477216/docvqa/Rerank/src/model/ernie.py", line 181, in _build_model
  name=model_name + 'encoder')
File "/users_2/d00477216/docvqa/Rerank/src/model/transformer_encoder.py", line 328, in encoder
  name=name + '_layer_' + str(i))
File "/users_2/d00477216/docvqa/Rerank/src/model/transformer_encoder.py", line 267, in encoder_layer
  name=name + '_multi_head_att')
File "/users_2/d00477216/docvqa/Rerank/src/model/transformer_encoder.py", line 141, in multi_head_attention
  dropout_rate)
File "/users_2/d00477216/docvqa/Rerank/src/model/transformer_encoder.py", line 116, in scaled_dot_product_attention
  weights = layers.softmax(product)
File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 1416, in softmax
  attrs=attrs)
File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 44, in append_op
  return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3621, in append_op
  attrs=kwargs.get("attrs", None))
File "/home/anaconda3/envs/lc_detectron/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2635, in __init__
  for frame in traceback.extract_stack():

ExternalError: CUDNN error(1), CUDNN_STATUS_NOT_INITIALIZED.
  [Hint: 'CUDNN_STATUS_NOT_INITIALIZED'.  The cuDNN library was not initialized properly. This error is usually returned when a call to cudnnCreate() fails or when cudnnCreate() has not been called prior to calling another cuDNN routine. In the former case, it is usually due to an error in the CUDA Runtime API called by cudnnCreate() or by an error in the hardware setup.  ] (at /paddle/paddle/phi/backends/gpu/gpu_resources.cc:211)
  [operator < softmax > error]

terminate called without an active exception

xdnjust avatar Oct 28 '22 09:10 xdnjust

@wawltor @JunnYu 请帮忙看一下,好像安装成功了,ocr也可以跑通,但是Rerank训练还是有上述问题

xdnjust avatar Oct 29 '22 01:10 xdnjust

上面表示环境还没有正确安装好,cudnn跟cuda可能配置没有匹配上,建议运行一下这个看一下。

import paddle
paddle.utils.run_check()

JunnYu avatar Oct 29 '22 02:10 JunnYu

https://www.paddlepaddle.org.cn/documentation/docs/zh/install/conda/linux-conda.html#anchor-0 建议使用anaconda配置一下环境, image

JunnYu avatar Oct 29 '22 02:10 JunnYu

这个命令我执行过了,截图如下,是不是表示安装好了已经?

image

xdnjust avatar Oct 29 '22 03:10 xdnjust

看这个安装确实成功了,但是使用的时候还是有报错,这就得确定是出什么问题了,建议先检查一下下面的命令能否运行。

import paddle
x_var = paddle.uniform((2, 4, 8, 8), dtype='float32', min=-1., max=1.)
conv = paddle.nn.Conv2D(4, 6, (3, 3))
y_var = conv(x_var)
print(y_var.shape)
# (2, 6, 6, 6)

JunnYu avatar Oct 30 '22 10:10 JunnYu

感谢解决了,服务器上有许多版本的cudnn,绕晕了

xdnjust avatar Oct 31 '22 09:10 xdnjust