开发者您好,
我们最近正在用layoutxlm做信息抽取任务,效果还是不错的,但是目前的问题是需要单独训练SER和RE两个模型,这两个模型目前看还是比较重的,目前想把这两个模型整合到一起,目前调研主流的观点是:先NER(命名实体识别)再RE(关系抽取),实体和关系共享同一个网络编码,比如pretrained_model的基础上,接一个NER的网络,然后接一个RE的网络,这种pipeline您有没有试过,准确率怎么样,有什么优缺点?能否回复一下,谢谢!
你好,这块目前应该还没有尝试过,不过你可以仅用1个SER模型完成关键信息抽取,可以参考这篇文档:https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/ppstructure/kie/how_to_do_kie.md
您好,麻烦问一下,您的paddlepaddle、paddlenlp版本是多少呢?我训练re任务报错,感觉是版本问题
错误:'paddle.fluid.core_avx.ops' has no attribute 'c broadcast'
我的版本:paddle-gpu2.3.2 paddlenlp2.3.0
非常感谢
环境:
paddle-bfloat 0.1.7
paddle2onnx 1.0.9
paddlefsl 1.1.0
paddlenlp 2.4.0
paddlepaddle-gpu 2.3.2
报错:
Traceback (most recent call last):
File "tools/train.py", line 226, in
main(config, device, logger, vdl_writer)
File "tools/train.py", line 201, in main
amp_dtype)
File "/limk/PaddleOCR/tools/program.py", line 303, in train
preds = model(batch)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/limk/PaddleOCR/ppocr/modeling/architectures/distillation_model.py", line 61, in forward
result_dict[model_name] = self.model_list[idx](x, data)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/limk/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 86, in forward
x = self.backbone(x)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/limk/PaddleOCR/ppocr/modeling/backbones/vqa_layoutlm.py", line 237, in forward
relations=relations)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1493, in forward
relations)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1368, in forward
relations, entities = self.build_relation(relations, entities)
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1318, in build_relation
if len(entities[b]["start"]) <= 2:
File "/usr/local/python3.7.0/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 740, in getitem
return self._getitem_index_not_tensor(item)
ValueError: (InvalidArgument) Currently, Tensor.indices() only allows indexing by Integers, Slices, Ellipsis, None, tuples of these types and list of Bool and Integers, but received str in 1th slice item (at /paddle/paddle/fluid/pybind/slice_utils.h:279)