PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

评估表格结构和cell坐标的时候,html结构预测acc为0.999,为什么检测的box的召回精度等评价指标都是0,排查了一下好像是gt的bbox没有读取到,导致eval的时候评价指标都是0,这个需要怎么修改?

Open plotnine1219 opened this issue 9 months ago • 2 comments

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:
  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:
  • 运行指令/Command Code:
  • 完整报错/Complete Error Message:
  • 我的config文件: Global: use_gpu: False epoch_num: 10 log_smooth_window: 20 print_batch_step: 20 save_model_dir: /Users/pengkang01/Desktop/txt转matrix/PaddleOCR/output/SLANet_ch save_epoch_step: 400

    evaluation is run every 331 iterations after the 0th iteration

    eval_batch_step: [0, 331] cal_metric_during_train: True pretrained_model: checkpoints: save_inference_dir: ./output/SLANet_ch/infer use_visualdl: False infer_img: /Users/pengkang01/Desktop/txt转matrix/PaddleOCR/500_table

    for data or label process

    character_dict_path: /Users/pengkang01/Desktop/txt转matrix/PaddleOCR/ppocr/utils/dict/table_structure_dict_ch.txt character_type: en max_text_length: &max_text_length 500 box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' infer_mode: False

use_sync_bn: True

use_sync_bn: False save_res_path: output/infer

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 5.0 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00000

Architecture: model_type: table algorithm: SLANet Backbone: name: PPLCNet scale: 1.0 pretrained: True use_ssld: True Neck: name: CSPPAN out_channels: 96 Head: name: SLAHead hidden_size: 256 max_text_length: *max_text_length loc_reg_num: &loc_reg_num 8

Loss: name: SLALoss structure_weight: 1.0 loc_weight: 2.0 loc_loss: smooth_l1

PostProcess: name: TableLabelDecode merge_no_span_structure: &merge_no_span_structure True

Metric: name: TableMetric main_indicator: acc compute_bbox_metric: True loc_reg_num: *loc_reg_num box_format: *box_format del_thead_tbody: True

Train: dataset: name: PubTabDataSet data_dir: 500_table/ label_file_list: [500_table/train.txt] transforms: - DecodeImage: img_mode: BGR channel_first: False - TableLabelEncode: learn_empty_box: True merge_no_span_structure: *merge_no_span_structure replace_empty_cell_token: False loc_reg_num: *loc_reg_num max_text_length: *max_text_length - TableBoxEncode: in_box_format: *box_format out_box_format: *box_format - ResizeTableImage: max_len: 488 - NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: 'hwc' - PaddingTableImage: size: [488, 488] - ToCHWImage: - KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: True batch_size_per_card: 1

batch_size_per_card: 48

drop_last: True

num_workers: 1

num_workers: 0

Eval: dataset: name: PubTabDataSet data_dir: 500_table/ label_file_list: [500_table/val.txt] transforms: - DecodeImage: img_mode: BGR channel_first: False - TableLabelEncode: learn_empty_box: True merge_no_span_structure: *merge_no_span_structure replace_empty_cell_token: False loc_reg_num: *loc_reg_num max_text_length: *max_text_length - TableBoxEncode: in_box_format: *box_format out_box_format: *box_format - ResizeTableImage: max_len: 488 - NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: 'hwc' - PaddingTableImage: size: [488, 488] - ToCHWImage: - KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: False drop_last: False

batch_size_per_card: 48

num_workers: 1

batch_size_per_card: 1
num_workers: 0

eval结果: [2024/04/29 15:55:18] ppocr INFO: metric eval *************** [2024/04/29 15:55:18] ppocr INFO: acc:0.9999990000010001 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_precision:0.0 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_recall:0 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_hmean:0 [2024/04/29 15:55:18] ppocr INFO: fps:1.2853043121990564

plotnine1219 avatar Apr 29 '24 07:04 plotnine1219

检查一下标注格式是否正确

UserWangZz avatar Apr 29 '24 08:04 UserWangZz

检查一下标注格式是否正确

标注格式是没有问题的

plotnine1219 avatar Apr 30 '24 01:04 plotnine1219

你好,可以debug看一下数据读取过程bbox是否正确读取到了

UserWangZz avatar May 06 '24 09:05 UserWangZz

你好,可以debug看一下数据读取过程bbox是否正确读取到了

您好是正确读取到了,但是eval的时候是将预测的bbox的坐标与bbox_mask进行匹配计算了

plotnine1219 avatar May 06 '24 09:05 plotnine1219

问题解决了吗?

UserWangZz avatar May 06 '24 11:05 UserWangZz

问题解决了吗?

没……

plotnine1219 avatar May 06 '24 11:05 plotnine1219

你好可以提供一下执行的命令吗,我排查一下

UserWangZz avatar May 07 '24 01:05 UserWangZz

你的paddle和paddleocr版本是多少呢

UserWangZz avatar May 07 '24 01:05 UserWangZz

ti

你的paddle和paddleocr版本是多少呢

你好,paddleocr-2.7.4. paddle-2.5.1 config文件 `Global: use_gpu: False epoch_num: 300 log_smooth_window: 20 print_batch_step: 20 save_model_dir: ./output/SLANet_ch/613_no_xuanzhuan_padding_LCPAN save_epoch_step: 400

evaluation is run every 331 iterations after the 0th iteration

eval_batch_step: [0, 331] cal_metric_during_train: True pretrained_model: checkpoints: save_inference_dir: ./output/SLANet_ch/613_no_xuanzhuan/infer/ use_visualdl: False infer_img: ./500_table/

for data or label process

character_dict_path: ppocr/utils/dict/table_structure_dict_ch.txt character_type: en max_text_length: &max_text_length 500 box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' infer_mode: False

use_sync_bn: True

use_sync_bn: False save_res_path: output/infer

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 5.0 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00000

Architecture: model_type: table algorithm: SLANet Backbone: name: PPLCNet scale: 1.0 pretrained: True use_ssld: True Neck: name: LCPAN out_channels: 96 Head: name: SLAHead hidden_size: 256 max_text_length: *max_text_length loc_reg_num: &loc_reg_num 8

Loss: name: SLALoss structure_weight: 1.0 loc_weight: 2.0 loc_loss: smooth_l1

PostProcess: name: TableLabelDecode merge_no_span_structure: &merge_no_span_structure True

Metric: name: TableMetric main_indicator: acc compute_bbox_metric: True loc_reg_num: *loc_reg_num box_format: *box_format del_thead_tbody: True

Train: dataset: name: PubTabDataSet data_dir: 500_table_no_xuanzhuan label_file_list: [500_table_no_xuanzhuan_padding/train.txt] transforms: - DecodeImage: img_mode: BGR channel_first: False - TableLabelEncode: learn_empty_box: True merge_no_span_structure: *merge_no_span_structure replace_empty_cell_token: False loc_reg_num: *loc_reg_num max_text_length: *max_text_length - TableBoxEncode: in_box_format: *box_format out_box_format: *box_format - ResizeTableImage: max_len: 488 - NormalizeImage: scale: 1./255. mean: [0.93135516, 0.93246497, 0.93411841] #[0.485, 0.456, 0.406] std: [0.1713343, 0.17117019, 0.17039258] #[0.229, 0.224, 0.225] order: 'hwc' - PaddingTableImage: size: [488, 488] - ToCHWImage: - KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: True batch_size_per_card: 4

batch_size_per_card: 48

drop_last: True

num_workers: 1

num_workers: 0

Eval: dataset: name: PubTabDataSet data_dir: 500_table_no_xuanzhuan/ label_file_list: [500_table_no_xuanzhuan_padding/val.txt] transforms: - DecodeImage: img_mode: BGR channel_first: False - TableLabelEncode: learn_empty_box: True merge_no_span_structure: *merge_no_span_structure replace_empty_cell_token: False loc_reg_num: *loc_reg_num max_text_length: *max_text_length - TableBoxEncode: in_box_format: *box_format out_box_format: *box_format - ResizeTableImage: max_len: 488 - NormalizeImage: scale: 1./255. mean: [0.93135516, 0.93246497, 0.93411841] #[0.485, 0.456, 0.406] std: [0.1713343, 0.17117019, 0.17039258] #[0.229, 0.224, 0.225] order: 'hwc' - PaddingTableImage: size: [488, 488] - ToCHWImage: - KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: False drop_last: False

batch_size_per_card: 48

num_workers: 1

batch_size_per_card: 4
num_workers: 0

`

plotnine1219 avatar May 07 '24 02:05 plotnine1219

你好,可以使用tools/infer_table.py推理可视化一下,看看模型输出是否正常,然后我们在检查box的评测哪里出了问题

UserWangZz avatar May 07 '24 09:05 UserWangZz

你好,可以使用tools/infer_table.py推理可视化一下,看看模型输出是否正常,然后我们在检查box的评测哪里出了问题

infer_table,除了不准没啥问题,就是eval表格box的三个指标有问题,结果都是0

plotnine1219 avatar May 07 '24 09:05 plotnine1219

尝试切换分支到2.7版本试试,如果还是不行的话,我这边复现一下看看

UserWangZz avatar May 07 '24 09:05 UserWangZz

尝试切换分支到2.7版本试试,如果还是不行的话,我这边复现一下看看

刚试了一下2.7也不行

plotnine1219 avatar May 07 '24 10:05 plotnine1219

好的 我这边尝试复现一下哈

UserWangZz avatar May 09 '24 09:05 UserWangZz

好的 我这边尝试复现一下哈 2.7分枝,ppocr/data/imaug/label_ops.py 第 718行# encode box bboxes = np.zeros( (self._max_text_len, self.loc_reg_num), dtype=np.float32) 创建了一个全零二维数组,然后用这个数组去和预测出来的box进行后续的iou和损失的计算,我理解应该是这里有问题

plotnine1219 avatar May 10 '24 09:05 plotnine1219

image 你好这边复现的结果是正常的

UserWangZz avatar May 11 '24 09:05 UserWangZz

好的 我这边尝试复现一下哈

(https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7/ppocr/data/imaug) /label_ops.py,第816行,我的理解是应该是读取cell里面的bbox,

image 你好这边复现的结果是正常的

image 你好,但是我们这个根本都没有训练到box, 这是我的标注信息 {"html": {"structure": {"tokens": ["

", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "cells": [{"tokens": [], "bbox": [[[30, 30], [246, 30], [246, 93], [30, 93]]]}, {"tokens": [], "bbox": [[[246, 30], [1096, 30], [1096, 93], [246, 93]]]}, {"tokens": [], "bbox": [[[30, 93], [246, 93], [246, 173], [30, 173]]]}, {"tokens": [], "bbox": [[[246, 93], [1096, 93], [1096, 173], [246, 173]]]}, {"tokens": [], "bbox": [[[30, 173], [246, 173], [246, 295], [30, 295]]]}, {"tokens": [], "bbox": [[[246, 173], [1096, 173], [1096, 295], [246, 295]]]}, {"tokens": [], "bbox": [[[30, 295], [246, 295], [246, 376], [30, 376]]]}, {"tokens": [], "bbox": [[[246, 295], [1096, 295], [1096, 376], [246, 376]]]}, {"tokens": [], "bbox": [[[30, 376], [246, 376], [246, 459], [30, 459]]]}, {"tokens": [], "bbox": [[[246, 376], [1096, 376], [1096, 459], [246, 459]]]}]}, "filename": "(已压缩)AHLY〔2022〕086号 龙源电力安徽来安三湾风电项目风电机组设备采购合同-工程建设部-李纳-2022.8.23(2).pdf180.png_table_1.png"}

plotnine1219 avatar May 11 '24 11:05 plotnine1219

好的 我这边尝试复现一下哈

(https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7/ppocr/data/imaug) /label_ops.py,第816行,我的理解是应该是读取cell里面的bbox,

image 你好这边复现的结果是正常的

image 你好,但是我们这个根本都没有训练到box, 这是我的标注信息 {"html": {"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "cells": [{"tokens": [], "bbox": [[[30, 30], [246, 30], [246, 93], [30, 93]]]}, {"tokens": [], "bbox": [[[246, 30], [1096, 30], [1096, 93], [246, 93]]]}, {"tokens": [], "bbox": [[[30, 93], [246, 93], [246, 173], [30, 173]]]}, {"tokens": [], "bbox": [[[246, 93], [1096, 93], [1096, 173], [246, 173]]]}, {"tokens": [], "bbox": [[[30, 173], [246, 173], [246, 295], [30, 295]]]}, {"tokens": [], "bbox": [[[246, 173], [1096, 173], [1096, 295], [246, 295]]]}, {"tokens": [], "bbox": [[[30, 295], [246, 295], [246, 376], [30, 376]]]}, {"tokens": [], "bbox": [[[246, 295], [1096, 295], [1096, 376], [246, 376]]]}, {"tokens": [], "bbox": [[[30, 376], [246, 376], [246, 459], [30, 459]]]}, {"tokens": [], "bbox": [[[246, 376], [1096, 376], [1096, 459], [246, 459]]]}]}, "filename": "(已压缩)AHLY〔2022〕086号 龙源电力安徽来安三湾风电项目风电机组设备采购合同-工程建设部-李纳-2022.8.23(2).pdf180.png_table_1.png"}

structure里的token可能有显示问题 Uploading image.png…

plotnine1219 avatar May 11 '24 11:05 plotnine1219

box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' 感觉是这里的问题, box_format: 'xyxyxyxy' 这样子试试

UserWangZz avatar May 11 '24 11:05 UserWangZz

box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy'

也不对……

plotnine1219 avatar May 11 '24 11:05 plotnine1219

box_format: &box_format 'xyxyxyxy' # 'xywh', 'xyxy', 'xyxyxyxy' 是不是和没带引号有关?

UserWangZz avatar May 11 '24 11:05 UserWangZz

box_format: &box_format 'xyxyxyxy' # 'xywh', 'xyxy', 'xyxyxyxy' 是不是和没带引号有关?

Uploading image.png… ops这里读取到了格式

plotnine1219 avatar May 13 '24 02:05 plotnine1219

That's fine❤️❤️❤️

UserWangZz avatar May 13 '24 02:05 UserWangZz

That's fine❤️❤️❤️

没没没,还没解决,刚才是不带引号也能读取到格式

plotnine1219 avatar May 13 '24 02:05 plotnine1219

That's fine❤️❤️❤️

应该是找到问题所在了,因为我的数据集的全部都是没有文字的空表格,在data/imaug/label_ops.py的730行, if 'bbox' in cells[bbox_idx] and len(cells[bbox_idx]['tokens']) == 0: bbox = cells[bbox_idx]['bbox'].copy() bbox = np.array(bbox, dtype=np.float32).reshape(-1) bboxes[i] = bbox bbox_masks[i] = 1.0 不能进入这个if判断,就导致dataloader读取不到这个bbox的位置

plotnine1219 avatar May 13 '24 03:05 plotnine1219

好的

UserWangZz avatar May 13 '24 06:05 UserWangZz