PaddleOCR
PaddleOCR copied to clipboard
kie中ser模型训练问题,如何不加载预训练模型直接训练,预训练模型如何训练的
./ppocr/modeling/backbones/vqa_layoutlm.py中这段代码: class NLPBaseModel(nn.Layer): def init(self, base_model_class, model_class, mode="base", type="ser", pretrained=True, checkpoints=None, **kwargs): super(NLPBaseModel, self).init() if checkpoints is not None: # load the trained model self.model = model_class.from_pretrained(checkpoints) else: # load the pretrained-model pretrained_model_name = pretrained_model_dict[base_model_class][ mode] if pretrained is True: base_model = base_model_class.from_pretrained( pretrained_model_name) else: base_model = base_model_class.from_pretrained(pretrained)
这段代码表示必须加载预训练模型或者已训练模型,如何不加载预训练模型直接训练,还有预训练模型是如何训练来的?
你好
- 这种大规模多模态模型建议都加载预训练模型去训练,不然无法收敛
- 预训练模型是使用自监督方法训练得到的,你可以参考LayoutXLM这篇论文
./2.6/ppocr/postprocess)/vqa_token_ser_layoutlm_postprocess.py中: def _infer(self, preds, segment_offset_ids, ocr_infos): results = []
for pred, segment_offset_id, ocr_info in zip(preds, segment_offset_ids,
ocr_infos):
pred = np.argmax(pred, axis=1)
pred = [self.id2label_map[idx] for idx in pred]
for idx in range(len(segment_offset_id)):
if idx == 0:
start_id = 0
else:
start_id = segment_offset_id[idx - 1]
end_id = segment_offset_id[idx]
curr_pred = pred[start_id:end_id]
curr_pred = [self.label2id_map_for_draw[p] for p in curr_pred]
if len(curr_pred) <= 0:
pred_id = 0
else:
counts = np.bincount(curr_pred)
pred_id = np.argmax(counts)
在2.6版本layoutxlm中,所有推理,预测中,预测的token大于512后都会被截断,pred的长度是512,而segment_offset_id是整个文本切割后长度,所有大于512长度后,pred_id的类别永远都是预测为0。
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.