PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

(InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 256, 30, 16] and the shape of Y = [1, 256, 30, 15]. Received [16] in X is not equal to [15] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86) [operator < elementwise_add > error]

Open piarosebelledelapaz opened this issue 9 months ago • 1 comments

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:
  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:
  • 运行指令/Command Code:<det_r50_vd_sast_icdar15.yml>
Global:
  use_gpu: false
  epoch_num: 100
  log_smooth_window: 20
  print_batch_step: 1
  save_model_dir: ./output/det_r50_vd_sast_icdar15/
  save_epoch_step: 1000
  # evaluation is run every 5000 iterations after the 4000th iteration
  eval_batch_step: [4000, 5000]
  cal_metric_during_train: False
  pretrained_model: ./pretrain_models/det_r50_vd_sast_icdar15_v2.0_train
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img:
  save_res_path: ./output/sast_r50_vd_ic15/predicts_sast.txt


Architecture:
  model_type: det
  algorithm: SAST
  Transform:
  Backbone:
    name: ResNet_SAST
    layers: 50
  Neck:
    name: SASTFPN
    with_cab: True
  Head:
    name: SASTHead

Loss:
  name: SASTLoss
  
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
  #  name: Cosine
    learning_rate: 0.001
  #  warmup_epoch: 0
  regularizer:
    name: 'L2'
    factor: 0

PostProcess:
  name: SASTPostProcess
  score_thresh: 0.5
  sample_pts_num: 2
  nms_thresh: 0.2
  expand_scale: 1.0
  shrink_ratio_of_width: 0.3

Metric:
  name: DetMetric
  main_indicator: hmean

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/detection/v2_img_train_det/
    label_file_list:
      - ./dataset/detection/v2_det_gt_train.txt
    ratio_list: [1.0]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - SASTProcessTrain:
          image_shape: [3, 512, 512]
          min_crop_side_ratio: 0.3
          min_crop_size: 24
          min_text_size: 4
          max_text_size: 512
      - KeepKeys:
          keep_keys: ['image', 'score_map', 'border_map', 'training_mask', 'tvo_map', 'tco_map'] # dataloader will return list in this order
  loader:
    shuffle: True
    drop_last: False
    batch_size_per_card: 4
    num_workers: 4

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/detection/v2_img_eval_det/
    label_file_list:
      - ./dataset/detection/v2_det_gt_eval.txt
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - DetResizeForTest:
          resize_long: 1536
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1 # must be 1
    num_workers: 2
  • 完整报错/Complete Error Message:
Traceback (most recent call last):
  File "tools/infer/predict_det.py", line 386, in <module>
    dt_boxes, _ = text_detector(img)
  File "tools/infer/predict_det.py", line 352, in __call__
    dt_boxes, elapse = self.predict(img)
  File "tools/infer/predict_det.py", line 245, in predict
    self.predictor.run()
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 256, 30, 16] and the shape of Y = [1, 256, 30, 15]. Received [16] in X is not equal to [15] in Y at i:3.
  [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86)
  [operator < elementwise_add > error]

Hello i am trying to visualize the detection results. Exporting the trained weights were successful but when i run the command

python tools/infer/predict_det.py --image_dir="./dataset/detection/img_eval_det/data_page_1.jpeg" --det_model_dir="./output/det_r50_vd_icdar_v2.0_inference" --use_gpu=False  

the error occurs. i checked github issues and they say that changing image size to a multiple of 16 or 32 would solve the issue so i tried [640, 640] but still getting the error.

can you help me fix this??

piarosebelledelapaz avatar May 22 '24 20:05 piarosebelledelapaz