PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

KIE增值税发票信息抽取,RE任务模型训练,准确率和召回率一直是0直到训练结束

Open sybest1259 opened this issue 3 years ago • 1 comments

  • 系统环境/System Environment: ubuntu18.4,python3.7

  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components: paddlepaddle=2.4.0rc0,paddlenlp=2.4.1

  • 运行指令/Command Code: python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml

  • 训练数据: [{"transcription": "广东增值税专用发票", "label": "question", "points": [[1648, 479], [2747, 469], [2749, 638], [1649, 648]], "id": 0, "linking": [[0, 24]]}, {"transcription": "2016年06月12日", "label": "answer", "points": [[3485, 789], [3954, 789], [3954, 851], [3485, 851]], "id": 1, "linking": [[1, 35]]}, {"transcription": "深圳市购机汇网络有限公司", "label": "answer", "points": [[1026, 962], [1766, 962], [1766, 1024], [1026, 1024]], "id": 2, "linking": [[2, 25]]}, {"transcription": "纳税人识别号:", "label": "question", "points": [[526, 1037], [947, 1037], [947, 1112], [526, 1112]], "id": 3, "linking": [[3, 4]]},...... train的副本.txt

  • 配置文件: re_vi_layoutxlm_xfund_zh的副本.txt

  • 完整报错/Complete Error Message: [2022/10/28 02:22:03] ppocr INFO: best metric, hmean: 0, precision: 0, recall: 0, fps: 11.429711280367338, best_epoch: 2000

sybest1259 avatar Oct 28 '22 02:10 sybest1259

我刚刚更新了一下最新的PaddleOCR-release-2.6代码,训练精度正常上升了,但是在使用动态图和inference模型预测的时候,报错:

Traceback (most recent call last): File "./tools/infer_kie_token_ser_re.py", line 217, in result = ser_re_engine(data) File "./tools/infer_kie_token_ser_re.py", line 151, in call preds = self.model(re_input) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/home/sxct-ocr/api/PaddleOCR-release-2.6-new/ppocr/modeling/architectures/base_model.py", line 86, in forward x = self.backbone(x) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/home/sxct-ocr/api/PaddleOCR-release-2.6-new/ppocr/modeling/backbones/vqa_layoutlm.py", line 237, in forward relations=relations) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1559, in forward relations) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1425, in forward relations, entities = self.build_relation(relations, entities) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1346, in build_relation entities[b] = entitie_new File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 797, in setitem return self.setitem_eager_tensor(item, value) ValueError: (InvalidArgument) The shape of tensor assigned value must match the shape of target shape: [512, 3], but now shape is [513, 3]. (at /paddle/paddle/phi/kernels/impl/set_value_kernel_impl.h:68) [operator < set_value > error]

sybest1259 avatar Oct 29 '22 03:10 sybest1259

你的paddlenlp更新是啥版本,需要2.4.1+

WenmuZhou avatar Nov 04 '22 05:11 WenmuZhou

使用最新版本的库和代码,精度上来了

sybest1259 avatar Nov 04 '22 10:11 sybest1259

使用最新版本的库和代码,精度上来了

您好,我也遇到了一样的问题,但是我之前训是没有问题的,现在环境搞混了也不知道是哪个包出了问题,您这边的环境配置能详细说明一下吗

shallowime avatar Nov 07 '22 09:11 shallowime

同遇到ValueError: (InvalidArgument) The shape of tensor assigned value must match the shape of target shape: [512, 3], but now shape is [513, 3]. (at /paddle/paddle/phi/kernels/impl/set_value_kernel_impl.h:68) 怎么解决?

shelleyyyyu avatar Nov 12 '22 10:11 shelleyyyyu

你好 有遇到KIE提取信息不全的问题吗?

leoterry-ulrica avatar Nov 19 '22 13:11 leoterry-ulrica

  • 系统环境/System Environment: ubuntu18.4,python3.7
  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components: paddlepaddle=2.4.0rc0,paddlenlp=2.4.1
  • 运行指令/Command Code: python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml
  • 训练数据: [{"transcription": "广东增值税专用发票", "label": "question", "points": [[1648, 479], [2747, 469], [2749, 638], [1649, 648]], "id": 0, "linking": [[0, 24]]}, {"transcription": "2016年06月12日", "label": "answer", "points": [[3485, 789], [3954, 789], [3954, 851], [3485, 851]], "id": 1, "linking": [[1, 35]]}, {"transcription": "深圳市购机汇网络有限公司", "label": "answer", "points": [[1026, 962], [1766, 962], [1766, 1024], [1026, 1024]], "id": 2, "linking": [[2, 25]]}, {"transcription": "纳税人识别号:", "label": "question", "points": [[526, 1037], [947, 1037], [947, 1112], [526, 1112]], "id": 3, "linking": [[3, 4]]},...... train的副本.txt
  • 配置文件: re_vi_layoutxlm_xfund_zh的副本.txt
  • 完整报错/Complete Error Message: [2022/10/28 02:22:03] ppocr INFO: best metric, hmean: 0, precision: 0, recall: 0, fps: 11.429711280367338, best_epoch: 2000

请问下 训练数据中的 "linking": [[1, 35]] 是如何标注的?求分享

xiaocode337317439 avatar Nov 28 '22 08:11 xiaocode337317439

同遇到ValueError: (InvalidArgument) The shape of tensor assigned value must match the shape of target shape: [512, 3], but now shape is [513, 3]. (at /paddle/paddle/phi/kernels/impl/set_value_kernel_impl.h:68) 怎么解决?

啥原因 解决了吗 是paddlenlp转换数据有问题吗

KingBoomBoom avatar Dec 21 '22 06:12 KingBoomBoom

我刚刚更新了一下最新的PaddleOCR-release-2.6代码,训练精度正常上升了,但是在使用动态图和inference模型预测的时候,报错:

Traceback (most recent call last): File "./tools/infer_kie_token_ser_re.py", line 217, in result = ser_re_engine(data) File "./tools/infer_kie_token_ser_re.py", line 151, in call preds = self.model(re_input) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/home/sxct-ocr/api/PaddleOCR-release-2.6-new/ppocr/modeling/architectures/base_model.py", line 86, in forward x = self.backbone(x) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/home/sxct-ocr/api/PaddleOCR-release-2.6-new/ppocr/modeling/backbones/vqa_layoutlm.py", line 237, in forward relations=relations) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1559, in forward relations) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1425, in forward relations, entities = self.build_relation(relations, entities) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1346, in build_relation entities[b] = entitie_new File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 797, in setitem return self.setitem_eager_tensor(item, value) ValueError: (InvalidArgument) The shape of tensor assigned value must match the shape of target shape: [512, 3], but now shape is [513, 3]. (at /paddle/paddle/phi/kernels/impl/set_value_kernel_impl.h:68) [operator < set_value > error]

什么原因导致的?

KingBoomBoom avatar Dec 21 '22 06:12 KingBoomBoom

你好 有遇到KIE提取信息不全的问题吗?

我遇到了 是re 和 ser 只预测第一个512序列块,后面的就扔掉不管了

liuyan20062010 avatar Feb 10 '23 01:02 liuyan20062010