PaddleOCR
PaddleOCR copied to clipboard
2.6版本中ser模型max_seq_len的长度512,大于该值后所有预测值都为0
./2.6/ppocr/postprocess)/vqa_token_ser_layoutlm_postprocess.py中:
def _infer(self, preds, segment_offset_ids, ocr_infos): results = []
for pred, segment_offset_id, ocr_info in zip(preds, segment_offset_ids,
ocr_infos):
pred = np.argmax(pred, axis=1)
pred = [self.id2label_map[idx] for idx in pred]
for idx in range(len(segment_offset_id)):
if idx == 0:
start_id = 0
else:
start_id = segment_offset_id[idx - 1]
end_id = segment_offset_id[idx]
curr_pred = pred[start_id:end_id]
curr_pred = [self.label2id_map_for_draw[p] for p in curr_pred]
if len(curr_pred) <= 0:
pred_id = 0
else:
counts = np.bincount(curr_pred)
pred_id = np.argmax(counts)
在2.6版本layoutxlm中,所有推理,预测中,预测的token大于512后都会被截断,pred的长度是512,而segment_offset_id是整个文本切割后长度,所有大于512长度后,pred_id的类别永远都是预测为0。这个感觉是个bug。
Originally posted by @aaferrero in https://github.com/PaddlePaddle/PaddleOCR/issues/7974#issuecomment-1283504791
目前是限制了token长度为512,你可以改成1024看看
1024要报错的,因为现在是必须加载预训练模型训练,所以token长度只能写成512,2.4版本中可以支持不同长度token的预测,2.6版本这块应该是bug,我在2.6版本照着2.4版本改了下,可以支持多长度token的预测。
改的代码,方便提个pr看看吗
https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/ppocr/data/imaug)/[vqa] 大概思路就是在这里面吧数据封装改下,比如token长度是750,把它padding成512+512的长度,就是1024,然后shape变成2,512,变成两个batch放进网络中预测
@aaferrero 改的代码能贴出来吗,也遇到了同样的问题😂