PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

SER和RE联合训练

Open githublsk opened this issue 2 years ago • 2 comments

开发者您好, 我们最近正在用layoutxlm做信息抽取任务,效果还是不错的,但是目前的问题是需要单独训练SER和RE两个模型,这两个模型目前看还是比较重的,目前想把这两个模型整合到一起,目前调研主流的观点是:先NER(命名实体识别)再RE(关系抽取),实体和关系共享同一个网络编码,比如pretrained_model的基础上,接一个NER的网络,然后接一个RE的网络,这种pipeline您有没有试过,准确率怎么样,有什么优缺点?能否回复一下,谢谢!

githublsk avatar Oct 10 '22 07:10 githublsk

你好,这块目前应该还没有尝试过,不过你可以仅用1个SER模型完成关键信息抽取,可以参考这篇文档:https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/ppstructure/kie/how_to_do_kie.md

littletomatodonkey avatar Oct 10 '22 08:10 littletomatodonkey

您好,麻烦问一下,您的paddlepaddle、paddlenlp版本是多少呢?我训练re任务报错,感觉是版本问题 错误:'paddle.fluid.core_avx.ops' has no attribute 'c broadcast' 我的版本:paddle-gpu2.3.2 paddlenlp2.3.0 非常感谢

sybest1259 avatar Oct 26 '22 09:10 sybest1259

更新一下paddlenlp2.4吧

andyjiang1116 avatar Nov 30 '22 07:11 andyjiang1116

环境: paddle-bfloat 0.1.7 paddle2onnx 1.0.9 paddlefsl 1.1.0 paddlenlp 2.4.0 paddlepaddle-gpu 2.3.2 报错: Traceback (most recent call last): File "tools/train.py", line 226, in main(config, device, logger, vdl_writer) File "tools/train.py", line 201, in main amp_dtype) File "/limk/PaddleOCR/tools/program.py", line 303, in train preds = model(batch) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/limk/PaddleOCR/ppocr/modeling/architectures/distillation_model.py", line 61, in forward result_dict[model_name] = self.model_list[idx](x, data) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/limk/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 86, in forward x = self.backbone(x) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/limk/PaddleOCR/ppocr/modeling/backbones/vqa_layoutlm.py", line 237, in forward relations=relations) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1493, in forward relations) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1368, in forward relations, entities = self.build_relation(relations, entities) File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1318, in build_relation if len(entities[b]["start"]) <= 2: File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 740, in getitem return self._getitem_index_not_tensor(item) ValueError: (InvalidArgument) Currently, Tensor.indices() only allows indexing by Integers, Slices, Ellipsis, None, tuples of these types and list of Bool and Integers, but received str in 1th slice item (at /paddle/paddle/fluid/pybind/slice_utils.h:279)

MIKL2077 avatar Sep 26 '23 03:09 MIKL2077

CUDA Version: 11.7

MIKL2077 avatar Sep 26 '23 03:09 MIKL2077