PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

2.6版本中ser模型max_seq_len的长度512,大于该值后所有预测值都为0

Open aaferrero opened this issue 2 years ago • 4 comments

    ./2.6/ppocr/postprocess)/vqa_token_ser_layoutlm_postprocess.py中:

def _infer(self, preds, segment_offset_ids, ocr_infos): results = []

    for pred, segment_offset_id, ocr_info in zip(preds, segment_offset_ids,
                                                 ocr_infos):
        pred = np.argmax(pred, axis=1)
        pred = [self.id2label_map[idx] for idx in pred]

        for idx in range(len(segment_offset_id)):
            if idx == 0:
                start_id = 0
            else:
                start_id = segment_offset_id[idx - 1]

            end_id = segment_offset_id[idx]

           curr_pred = pred[start_id:end_id]
            curr_pred = [self.label2id_map_for_draw[p] for p in curr_pred]

            if len(curr_pred) <= 0:
                pred_id = 0
            else:
                counts = np.bincount(curr_pred)
                pred_id = np.argmax(counts)

在2.6版本layoutxlm中,所有推理,预测中,预测的token大于512后都会被截断,pred的长度是512,而segment_offset_id是整个文本切割后长度,所有大于512长度后,pred_id的类别永远都是预测为0。这个感觉是个bug。

Originally posted by @aaferrero in https://github.com/PaddlePaddle/PaddleOCR/issues/7974#issuecomment-1283504791

aaferrero avatar Oct 20 '22 08:10 aaferrero

目前是限制了token长度为512,你可以改成1024看看

WenmuZhou avatar Oct 22 '22 07:10 WenmuZhou

1024要报错的,因为现在是必须加载预训练模型训练,所以token长度只能写成512,2.4版本中可以支持不同长度token的预测,2.6版本这块应该是bug,我在2.6版本照着2.4版本改了下,可以支持多长度token的预测。

aaferrero avatar Oct 28 '22 08:10 aaferrero

改的代码,方便提个pr看看吗

WenmuZhou avatar Nov 04 '22 05:11 WenmuZhou

https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/ppocr/data/imaug)/[vqa] 大概思路就是在这里面吧数据封装改下,比如token长度是750,把它padding成512+512的长度,就是1024,然后shape变成2,512,变成两个batch放进网络中预测

aaferrero avatar Nov 07 '22 08:11 aaferrero

@aaferrero 改的代码能贴出来吗,也遇到了同样的问题😂

jackieZhouQQ avatar Oct 11 '23 02:10 jackieZhouQQ